vmx: align destination to fix valgrind invalid memory writes

Graphics / Pixman - Siarhei Siamashka [gmail.com] - 1 October 2013 15:42 UTC

The SIMD optimized inner loops in the VMX/Altivec code are trying to emulate unaligned accesses to the destination buffer. For each 4 pixels (which fit into a 128-bit register) the current implementation: 1. first performs two aligned reads, which cover the needed data 2. reshuffles bytes to get the needed data in a single vector register 3. does all the necessary calculations 4. reshuffles bytes back to their original location in two registers 5. performs two aligned writes back to the destination buffer

Unfortunately in the case if the destination buffer is unaligned and the width is a perfect multiple of 4 pixels, we may have some writes crossing the boundaries of the destination buffer. In a multithreaded environment this may potentially corrupt the data outside of the destination buffer if it is concurrently read and written by some other thread.

The valgrind report for blitters-test is full of:

==23085== Invalid write of size 8
==23085== at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089)
==23085== by 0x100446EF: general_composite_rect (pixman-general.c:214)
==23085== by 0x10002537: test_composite (blitters-test.c:363)
==23085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
==23085== by 0x10004943: fuzzer_test_main (utils.c:728)
==23085== by 0x10002C17: main (blitters-test.c:397)
==23085== Address 0x5188218 is 0 bytes after a block of size 88 alloc'd
==23085== at 0x4051DA0: memalign (vg_replace_malloc.c:581)
==23085== by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709)
==23085== by 0x10004CFF: aligned_malloc (utils.c:833)
==23085== by 0x10001DCB: create_random_image (blitters-test.c:47)
==23085== by 0x10002263: test_composite (blitters-test.c:283)
==23085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
==23085== by 0x10004943: fuzzer_test_main (utils.c:728)
==23085== by 0x10002C17: main (blitters-test.c:397)

This patch addresses the problem by first aligning the destination buffer at a 16 byte boundary in each combiner function. This trick is borrowed from the pixman SSE2 code.

It allows to pass the new thread-test on PowerPC VMX/Altivec systems and also resolves the "make check" failure reported for POWER7 hardware: http://lists.freedesktop.org/archives/pixman/2013-August/002871.html

b6c5ba0 vmx: align destination to fix valgrind invalid memory writes
pixman/pixman-vmx.c | 422 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 422 insertions(+)

Upstream: cgit.freedesktop.org


  • Share