ARMv6: Add fast path for src_x888_0565

Graphics / Pixman - Ben Avison [riscosopen.org] - 1 May 2014 14:11 EDT

Benchmark results, "before" is upstream/master 5f661ee719be25c3aa0eb0d45e0db23a37e76468, and "after" contains this patch on top.

lowlevel-blt-bench, src_8888_0565, 100 iterations:

Before After Mean StdDev Mean StdDev Confidence Change L1 25.9 0.20 115.6 0.70 100.00% +347.1% L2 14.4 0.23 52.7 3.48 100.00% +265.0% M 14.1 0.01 79.8 0.17 100.00% +465.9% HT 10.2 0.03 32.9 0.31 100.00% +221.2%
VT 9.8 0.03 29.8 0.25 100.00% +203.4% R 9.4 0.03 27.8 0.18 100.00% +194.7% RT 4.6 0.04 10.9 0.29 100.00% +135.9%

At most 19 outliers rejected per test per set.

cairo-perf-trace with trimmed traces results were indifferent.

A system-wide perf_3.10 profile on Raspbian shows significant differences in the X server CPU usage. The following were measured from a 130x62 char lxterminal running 'dmesg' every 0.5 seconds for roughly 30 seconds. These profiles are libpixman.so symbols only.

Before:

Samples: 63K of event 'cpu-clock', Event count (approx.): 2941348112, DSO: libpixman-1.so.0.33.1 37.77% Xorg [.] fast_fetch_r5g6b5 14.39% Xorg [.] pixman_composite_over_n_8_8888_asm_armv6 8.51% Xorg [.] fast_write_back_r5g6b5 7.38% Xorg [.] pixman_composite_src_8888_8888_asm_armv6 4.39% Xorg [.] pixman_composite_add_8_8_asm_armv6 3.69% Xorg [.] pixman_composite_src_n_8888_asm_armv6 2.53% Xorg [.] _pixman_image_validate 2.35% Xorg [.] pixman_image_composite32

After:

Samples: 31K of event 'cpu-clock', Event count (approx.): 3619782704, DSO: libpixman-1.so.0.33.1 22.36% Xorg [.] pixman_composite_over_n_8_8888_asm_armv6 13.59% Xorg [.] pixman_composite_src_x888_0565_asm_armv6 12.75% Xorg [.] pixman_composite_src_8888_8888_asm_armv6 6.79% Xorg [.] pixman_composite_add_8_8_asm_armv6 5.95% Xorg [.] pixman_composite_src_n_8888_asm_armv6 4.12% Xorg [.] pixman_image_composite32 3.69% Xorg [.] _pixman_image_validate 3.65% Xorg [.] _pixman_bits_image_setup_accessors

Before, fast_fetch_r5g6b5 + fast_write_back_r5g6b5 took 46% of the samples in libpixman, and probably incurred some memcpy() load, too. After, pixman_composite_src_x888_0565_asm_armv6 takes 14%. Note, that the sample counts are very different before/after, as less time is spent in Pixman and running time is not exactly the same.

Furthermore, in the above test, the CPU idle function was sampled 9% before, and 15% after.

v4, Pekka Paalanen : Re-benchmarked on Raspberry Pi, commit message.

91f32ce ARMv6: Add fast path for src_x888_0565
pixman/pixman-arm-simd-asm.S | 77 ++++++++++++++++++++++++++++++++++++++++++
pixman/pixman-arm-simd.c | 7 ++++
2 files changed, 84 insertions(+)

Upstream: cgit.freedesktop.org


  • Share