We used the SSE reciprocal square root instruction to vectorize the serch rather than compare one at a time with multiplies. Speeds up the entire encoder by 8-10%.
76674fe SSE2 implementation of the PVQ search
celt/bands.c | 2 +-
celt/tests/test_unit_mathops.c | 1 +
celt/tests/test_unit_rotation.c | 1 +
celt/vq.c | 34 ++++--
celt/vq.h | 12 ++-
celt/x86/vq_sse.h | 50 +++++++++
celt/x86/vq_sse2.c | 217 +++++++++++++++++++++++++++++++++++++++
celt/x86/x86_celt_map.c | 13 +++
celt_headers.mk | 1 +
celt_sources.mk | 2 +-
10 files changed, 320 insertions(+), 13 deletions(-)
Upstream: git.xiph.org