Based on the aarch64 asm. CPU cycle counts on cortex-a9 compared to gcc 4.8.2: before: 475 decicycles in get_cabac_noinline, 67106035 runs, 2829 skips after: 393 decicycles in get_cabac_noinline, 67106474 runs, 2390 skips
Overall speedup is above 2%. Code generated by clang 3.4 is slower on the same hardware and the relative change is a little larger.
634d9d8 arm: get_cabac inline asm
libavcodec/arm/cabac.h | 102 ++++++++++++++++++++++++++++++++++++++++++
libavcodec/cabac_functions.h | 3 ++
2 files changed, 105 insertions(+)
Upstream: git.libav.org