aarch64: Make transpose_4x4H do a regular transpose

Multimedia / Libav - Martin Storsjö [martin.st] - 26 March 2016 14:25 UTC

Previously, ff_h264_idct_add_neon (originally in the arm version) used a non-regular transpose in order to be able to use more instructions that deal with registers as 128 bit register pairs. The aarch64 translation doesn't do it to the same extent, but brought along the same structure since it was a straight translation.

This reshuffles ff_h264_idct_add_neon, bringing it closer to the C implementation, making the transpose_4x4H macro do a regular transpose, usable for other algorithms as well.

Previously, the third and fourth output from transpose_4x4H were swapped, and prior to cc29d96d5a, the same inputs as well. In addition to just swapping the outputs, also renumber the intermediate registers for better readability (making the register order match transpose_4x8B).

This runs with the same number of cycles as before.

cdb1665 aarch64: Make transpose_4x4H do a regular transpose
libavcodec/aarch64/h264idct_neon.S | 24 ++++++++++++------------
libavcodec/aarch64/neon.S | 12 ++++++------
2 files changed, 18 insertions(+), 18 deletions(-)

Upstream: git.libav.org

Related Libav Activity

aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
Martin Storsjö

aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling
Martin Storsjö

aarch64: vp9: Add NEON itxfm routines
Martin Storsjö

aarch64: Make the function pointer tables position independent
Martin Storsjö

Recent Libav Activity

rtsp: add pkt_size option
Tristan Matthews

aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
Martin Storsjö

aarch64: vp8: Fix assembling with clang
Martin Storsjö

libavcodec: vp8 neon optimizations for aarch64
Magnus Röös

h264/aarch64: add intra loop filter neon asm
Janne Grunau

aarch64: Make transpose_4x4H do a regular transpose

Multimedia / Libav - Martin Storsjö [martin.st] - 26 March 2016 14:25 UTC

Related Libav Activity

Share

Recent Libav Activity