This started as a trivial change to Anton's rawmemchr. I got carried away. This is a hybrid between P8's asympotically faster 64B checks with extremely efficient small string checks e.g <64B (and sometimes a little bit more depending on alignment).
The second trick is to align to 64B by running a 48B checking loop 16B at a time until we naturally align to 64B (i.e checking 48/96/144 bytes/iteration based on the alignment after the first 5 comparisons). This allieviates the need to check page boundaries.
Finally, explicly use the P7 strlen with the runtime loader when building P9. We need to be cautious about vector/vsx extensions here on P9 only builds.
a23bd00f9d powerpc64le: add optimized strlen for P9
sysdeps/powerpc/powerpc64/le/power9/rtld-strlen.S | 1 +
sysdeps/powerpc/powerpc64/le/power9/strlen.S | 213 +++++++++++++++++++++
sysdeps/powerpc/powerpc64/multiarch/Makefile | 2 +-
.../powerpc/powerpc64/multiarch/ifunc-impl-list.c | 4 +
.../powerpc/powerpc64/multiarch/strlen-power9.S | 2 +
sysdeps/powerpc/powerpc64/multiarch/strlen.c | 5 +
6 files changed, 226 insertions(+), 1 deletion(-)