This improved memcmp provides a fast path for compares up to 16 bytes and then compares 16 bytes at a time, thus optimizing loads from both sources. The glibc memcmp microbenchmark retains performance (with an error of ~1ns) for smaller compare sizes and reduces up to 31% of execution time for compares up to 4K on the APM Mustang. On Qualcomm Falkor this improves to almost 48%, i.e. it is almost 2x improvement for sizes of 2K and above.
- sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a time.
30a81dae5b aarch64: Optimized memcmp for medium to large sizes
ChangeLog | 3 ++
sysdeps/aarch64/memcmp.S | 76 +++++++++++++++++++++++++++++++++++-------------
2 files changed, 58 insertions(+), 21 deletions(-)