Add ARMv4/ARMv5E macros

Multimedia / Opus - Timothy B. Terriberry [xiph.org] - 19 May 2013 21:12 UTC

Original patch by Aurélien Zanelli: http://lists.xiph.org/pipermail/opus/2013-May/002078.html

Revised version:
- Add autconf detection (ported from libtheora).
- Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
- Use actual macros so they can still be selectively overridden.
- Split out ARMv4 parts and add a few more ARMv4 macros.
- Label blocks to make them easy to find in generated assembly.
- Fix MULT16_32_Q15() so we can pass make check. The MDCT test passes in values larger than 2**30 for b. The new version should be just as fast (or faster, since it's easier to merge the shift with following instructions), and there's no appreciable impact on accuracy (FFT/MDCT SNR actually goes up in most cases).
- Fix register constraints. We were using early-clobber flags in a bunch of places that didn't need them, and commutative-pair flags in a bunch of places that weren't actually commutative. This was Jean-Marc's fault (the original code came from Speex).
- Simplify silk_CLZ16().- Port over iFFT C_MULC asm by Andree Buschmann from Rockbox.- Speed up the C_MULC asm by using LDRD, allowing more flexible addressing, re-ordering instructions to avoid some stalls, allowing more flexible register allocation, and getting things out of the inline asm block so the compiler can schedule them better.- Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the new C_MULC.

In total, this patch gives a 22.3% speed-up on test_opus_encoder on a 600 MHz Cortex A8 using gcc 4.2.1, When restricted to ARMv4 optimizations, it gives a 9.6% speed-up on the same processor/compiler. On the conformance test vectors: Average mono quality is 97.0583 % Average stereo quality is 97.775 %

972a34e Add ARMv4/ARMv5E macros.
autogen.sh | 1 +
celt/_kiss_fft_guts.h | 173 ++++++++++++++++++++++++++++++++++
celt/arch.h | 8 +-
celt/fixed_armv4.h | 71 ++++++++++++++
celt/fixed_armv5e.h | 127 +++++++++++++++++++++++++
configure.ac | 32 ++++++-
m4/as-gcc-inline-assembly.m4 | 106 +++++++++++++++++++++
silk/SigProc_FIX.h | 8 ++
silk/SigProc_FIX_armv4.h | 47 ++++++++++
silk/SigProc_FIX_armv5e.h | 61 ++++++++++++
silk/macros.h | 8 ++
silk/macros_armv4.h | 103 ++++++++++++++++++++
silk/macros_armv5e.h | 213 ++++++++++++++++++++++++++++++++++++++++++
13 files changed, 953 insertions(+), 5 deletions(-)

Upstream: git.xiph.org


  • Share