remap: Add ARM NEON optimized remapping and rearrange code

System Internals / PulseAudio - Peter Meerwald [bct-electronic.com] - 25 May 2014 11:13 UTC

v7:
- cleanups and reduce code; add 4->4 channels mappings, add rearrange code
v6:
- rename mono_to_stereo_float_neon_a9() to mono_to_stereo_float_arm_generic(); note that Cortex-A8 and -A9/A15 are different, later chips do not benefit from NEON memory transfers
v5:
- 4-channel remapping
- use vrhadd instruction, fix int16 overflow for to-mono case
v4:
- fix for sample length < 4
v3:
- fix test code: init float and int map_table
- different code path for Cortex-A8 and later (-A9, A15, unknown)
- convert from intrinsics to inline assembly
v2:
- add ARM NEON stereo-to-mono remapping code
- static __attribute__ ((noinline)) is necessary to prevent inlining and work around gcc 4.6 ICE, see https://bugs.launchpad.net/bugs/936863
- call test code, the reference implementation is obtained using pa_get_init_remap_func()
- remove check for NEON flags
v1:
- ARM NEON mono-to-stereo remapping code

note that orig is the time of the special-case C implementation where available, not the generic matric remapping implementation

on ARM Cortex-A8 (TI OMAP3 DM3730 @ 1GHz) (Linaro GCC 4.6):

Checking NEON remap (float, mono->stereo) func: 757474 usec (avg: 7574.74, min = 6165, max = 11963, stddev = 1479.71). orig: 784882 usec (avg: 7848.82, min = 6835, max = 17639, stddev = 1656.01). Checking NEON remap (float, mono->4-channel) func: 1545507 usec (avg: 15455.1, min = 6531, max = 30609, stddev = 2689.6). orig: 2601413 usec (avg: 26014.1, min = 22796, max = 52979, stddev = 3281.84). Checking NEON remap (s16, mono->stereo) func: 343844 usec (avg: 3438.44, min = 1709, max = 8880, stddev = 1180.1). orig: 474460 usec (avg: 4744.6, min = 4212, max = 7751, stddev = 1069.29). Checking NEON remap (s16, mono->4-channel) func: 736574 usec (avg: 7365.74, min = 3784, max = 11902, stddev = 1637.79). orig: 1062772 usec (avg: 10627.7, min = 7630, max = 17517, stddev = 3011.44). Checking NEON remap (float, stereo->mono) func: 571412 usec (avg: 5714.12, min = 4608, max = 15808, stddev = 2131.7). orig: 4356630 usec (avg: 43566.3, min = 41596, max = 52430, stddev = 2056.79). Checking NEON remap (float, 4-channel->mono) func: 1443202 usec (avg: 14432, min = 12298, max = 32349, stddev = 3300). orig: 9273410 usec (avg: 92734.1, min = 81940, max = 184265, stddev = 23310). Checking NEON remap (s16, stereo->mono) func: 185761 usec (avg: 1857.61, min = 1556, max = 4975, stddev = 743.681). orig: 1204776 usec (avg: 12047.8, min = 10711, max = 16022, stddev = 1596.88). Checking NEON remap (s16, 4-channel->mono) func: 482912 usec (avg: 4829.12, min = 4241, max = 9980, stddev = 1270.8). orig: 1692050 usec (avg: 16920.5, min = 14679, max = 30060, stddev = 2760.7). Checking NEON remap (float, 4-channel->4-channel) func: 5324471 usec (avg: 53244.7, min = 49774, max = 87036, stddev = 4255.47). orig: 73674628 usec (avg: 736746, min = 720338, max = 824128, stddev = 18361.8). Checking NEON remap (s16, 4-channel->4-channel) func: 5321320 usec (avg: 53213.2, min = 49591, max = 84443, stddev = 3931.49). orig: 24122021 usec (avg: 241220, min = 233337, max = 291687, stddev = 9064.31).

Checking NEON remap (float, stereo rearrange) func: 1116547 usec (avg: 11165.5, min = 9124, max = 27496, stddev = 3345.63). orig: 1385011 usec (avg: 13850.1, min = 12237, max = 18005, stddev = 1793.05). Checking NEON remap (s16, stereo rearrange) func: 517027 usec (avg: 5170.27, min = 4577, max = 9735, stddev = 1215.23). orig: 1208435 usec (avg: 12084.4, min = 10406, max = 25299, stddev = 2512.02). Checking NEON remap (float, 4-channel rearrange) func: 1564667 usec (avg: 15646.7, min = 13855, max = 20172, stddev = 1766.48). orig: 2970000 usec (avg: 29700, min = 26215, max = 45654, stddev = 2351.07). Checking NEON remap (s16, 4-channel rearrange) func: 1088808 usec (avg: 10888.1, min = 9064, max = 23407, stddev = 2465.82). orig: 1908416 usec (avg: 19084.2, min = 16968, max = 22705, stddev = 1637.46).

54a10eb remap: Add ARM NEON optimized remapping and rearrange code
src/Makefile.am | 6 +-
src/pulsecore/remap_neon.c | 498 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 502 insertions(+), 2 deletions(-)

Upstream: cgit.freedesktop.org


  • Share