This patch significantly improves performance of strstr using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly.
By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear.
Small needles up to size 3 use a dedicated linear search. Very long needles use the Two-Way algorithm.
The performance gain using the improved bench-strstr on Cortex-A72 is 5.8 times basic_strstr and 3.7 times twoway_strstr.
Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
5e0a7ecb66 Improve performance of strstr
ChangeLog | 9 +++
string/str-two-way.h | 9 ++-
string/strstr.c | 165 ++++++++++++++++++++++++++++++++++++---------------
3 files changed, 132 insertions(+), 51 deletions(-)