x86: synth filter float: implement SSE2 version

Multimedia / Libav - Christophe Gisquet [gmail.com] - 28 February 2014 06:00 UTC

Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.

08e3ea6 x86: synth filter float: implement SSE2 version
libavcodec/synth_filter.c | 1 +
libavcodec/synth_filter.h | 1 +
libavcodec/x86/dcadsp.asm | 152 ++++++++++++++++++++++++++++++++++++++++++
libavcodec/x86/dcadsp_init.c | 28 ++++++++
4 files changed, 182 insertions(+)

Upstream: git.libav.org


  • Share