aarch64: Optimized implementation of strcpy

System Internals / glibc - Xuelei Zhang [huawei.com] - 19 December 2019 19:31 UTC

Optimize the strcpy implementation by using vector loads and operations in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases in bench-strlen by 5%~18% when the length of src is greater than 64 bytes, with gains throughout the benchmark.

Checked on aarch64-linux-gnu.

0237b61526 aarch64: Optimized implementation of strcpy
sysdeps/aarch64/strcpy.S | 59 ++++++++++++++++++++++--------------------------
1 file changed, 27 insertions(+), 32 deletions(-)

Upstream: sourceware.org


  • Share