aarch64: Improve strncmp for mutually misaligned inputs

System Internals / glibc - Siddhesh Poyarekar [sourceware.org] - 13 March 2018 18:27 EDT

The mutually misaligned inputs on aarch64 are compared with a simple byte copy, which is not very efficient. Enhance the comparison similar to strcmp by loading a double-word at a time. The peak performance improvement (i.e. 4k maxlen comparisons) due to this on the strncmp microbenchmark is as follows:

falkor: 3.5x (up to 72% time reduction) cortex-a73: 3.5x (up to 71% time reduction) cortex-a53: 3.5x (up to 71% time reduction)

All mutually misaligned inputs from 16 bytes maxlen onwards show upwards of 15% improvement and there is no measurable effect on the performance of aligned/mutually aligned inputs.

- sysdeps/aarch64/strncmp.S (count): New macro. (strncmp): Store misaligned length in SRC1 in COUNT. (mutual_align): Adjust. (misaligned8): Load dword at a time when it is safe.

7108f1f944 aarch64: Improve strncmp for mutually misaligned inputs
ChangeLog | 7 ++++
sysdeps/aarch64/strncmp.S | 95 +++++++++++++++++++++++++++++++++++++++--------
2 files changed, 87 insertions(+), 15 deletions(-)

Upstream: sourceware.org


  • Share