x86: synth filter float: implement SSE2 version

Multimedia / Libav - Christophe Gisquet [gmail.com] - 28 February 2014 06:00 UTC

Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.

08e3ea6 x86: synth filter float: implement SSE2 version
libavcodec/synth_filter.c | 1 +
libavcodec/synth_filter.h | 1 +
libavcodec/x86/dcadsp.asm | 152 ++++++++++++++++++++++++++++++++++++++++++
libavcodec/x86/dcadsp_init.c | 28 ++++++++
4 files changed, 182 insertions(+)

Upstream: git.libav.org

Related Libav Activity

x86: dcadsp: implement SSE lfe_dir
Christophe Gisquet

x86: dcadsp: implement int8x8_fmul_int32
Christophe Gisquet

Recent Libav Activity

rtsp: add pkt_size option
Tristan Matthews

aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
Martin Storsjö

aarch64: vp8: Fix assembling with clang
Martin Storsjö

libavcodec: vp8 neon optimizations for aarch64
Magnus Röös

h264/aarch64: add intra loop filter neon asm
Janne Grunau

x86: synth filter float: implement SSE2 version

Multimedia / Libav - Christophe Gisquet [gmail.com] - 28 February 2014 06:00 UTC

Related Libav Activity

Share

Recent Libav Activity