This patch significantly improves performance of memmem using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly.
By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear.
Small needles up to size 2 use a dedicated linear search. Very long needles use the Two-Way algorithm (to avoid increasing stack size or slowing down the common case, inlining is disabled).
The performance gain is 6.6 times on English text on AArch64 using random needles with average size 8.
Tested against GLIBC testsuite and randomized tests.
680942b016 Improve performance of memmem
ChangeLog | 4 ++
string/memmem.c | 127 +++++++++++++++++++++++++++++++++++++-------------------
2 files changed, 89 insertions(+), 42 deletions(-)