Complete upgrade of gnu grep 2.14 => 2.20

Operating Systems / DragonFlyBSD - John Marino [marino.st] - 10 October 2014 17:51 UTC

** 2.20 Bug fixes grep --max-count=N FILE would no longer stop reading after Nth match. I.e., while grep would still print the correct output, it would continue reading until end of input, and hence, potentially forever. [bug introduced in grep-2.19]

A command like echo aa|grep -E 'a(b$|c$)' would mistakenly report the input as a matched line. [bug introduced in grep-2.19]

** 2.20 Changes in behavior grep --exclude-dir='FOO/' now excludes the directory FOO. Previously, the trailing slash meant the option was ineffective.

** 2.19 Improvements Performance has improved, typically by 10% and in some cases by a factor of 200. However, performance of grep -P in UTF-8 locales has gotten worse as part of the fix for the crashes mentioned below.

** 2.19 Bug fixes grep no longer mishandles patterns like [a-[.z.]], and no longer mishandles patterns like [^a] in locales that have multicharacter collating sequences so that [^a] can match a string of two characters.

grep no longer mishandles an empty pattern at the end of a pattern list. [bug introduced in grep-2.5]

grep -C NUM now outputs separators consistently even when NUM is zero, and similarly for grep -A NUM and grep -B NUM. [bug present since "the beginning"]

grep -f no longer mishandles patterns containing NUL bytes. [bug introduced in grep-2.11]

Plain grep, grep -E, and grep -F now treat encoding errors in patterns the same way the GNU regular expression matcher treats them, with respect to whether the errors can match parts of multibyte characters in data. [bug present since "the beginning"]

grep -w no longer mishandles a potential match adjacent to a letter that takes up two or more bytes in a multibyte encoding. Similarly, the patterns '\<', '\>', '\b', and '\B' no longer mishandle word-boundary matches in multibyte locales. [bug present since "the beginning"]

grep -P now reports an error and exits when given invalid UTF-8 data. Previously it was unreliable, and sometimes crashed or looped. [bug introduced in grep-2.16]

grep -P now works with -w and -x and backreferences. Before, echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match.

grep -Pw now works like grep -w in that the matched string has to be preceded and followed by non-word components or the beginning and end of the line (as opposed to word boundaries before). Before, this echo a@@a| grep -Pw @@ would match, yet this echo a@@a| grep -w @@ would not. Now, they both fail to match, per the documentation on how grep's -w works.

grep -i no longer mishandles patterns containing titlecase characters. For example, in a locale containing the titlecase character 'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J), 'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ) and 'lj' (U+01C9 LATIN SMALL LETTER LJ).

** 2.18 Bug fixes grep no longer mishandles patterns like [^^-~] in unibyte locales. [bug introduced in grep-2.8]

grep -i in a multibyte, non-UTF8 locale could be up to 200 times slower than in 2.16. [bug introduced in grep-2.17]

** 2.17 Improvements grep -i in a multibyte locale is now typically 10 times faster for patterns that do not contain \ or [.

grep (without -i) in a multibyte locale is now up to 7 times faster when processing many matched lines.

** 2.16 Bug fixes The fix to make \s and \S work with multi-byte white space broke the use of each shortcut whenever followed by a repetition operator. For example, \s*, \s+, \s? and \s{3} would all malfunction in a multi-byte locale. [bug introduced in grep-2.15]

The fix to make grep -P work better with UTF-8 made it possible for grep to evoke a larger set of PCRE errors, some of which could trigger an abort. E.g., this would abort: printf '\x82'|LC_ALL=en_US.UTF-8 grep -P y Now grep handles arbitrary PCRE errors. [bug introduced in grep-2.15]

Handle very long lines (2GiB and longer) on systems with a deficient read system call.

** 2.15 Bug fixes grep's \s and \S failed to work with multi-byte white space characters. For example, \s would fail to match a non-breaking space, and this would print nothing: printf '\xc2\xa0' | LC_ALL=en_US.UTF-8 grep '\s' A related bug is that \S would mistakenly match an invalid multibyte character. For example, the following would match: printf '\x82\n' | LC_ALL=en_US.UTF-8 grep '^\S$' [bug present since grep-2.6]

grep -i would segfault on systems using UTF-16-based wchar_t (Cygwin) when converting an input string containing certain 4-byte UTF-8 sequences to lower case. The conversions to wchar_t and back to a UTF-8 multibyte string did not take surrogate pairs into account. [bug present since at least grep-2.6, though the segfault is new with 2.13]

grep -E would segfault when given a regexp like '([^.]*[M]){1,2}' for any multibyte character M. [bug introduced in grep-2.6, which would segfault, but 2.7 and 2.8 had no problem, and 2.9 through 2.14 would hit a failed assertion. ]

grep -F would get stuck in an infinite loop when given a search string that is an invalid byte sequence in the current locale and that matches the bytes of the input twice on a line. Now grep fails with exit status 1.

grep -P could misbehave. While multi-byte mode is only supported by PCRE with UTF-8 locales, grep did not activate it. This would cause failures to match multibyte characters against some regular expressions, especially those including the '.' or '\p' metacharacters.

** 2.15 New features grep -P can now use a just-in-time compiler to greatly speed up matches, This feature is transparent to the user; no flag is required to enable it. It is only available if the corresponding support in the PCRE library is detected when grep is compiled.

51ddd70 Complete upgrade of gnu grep 2.14 => 2.20
contrib/grep/README.DELETED | 4 +-
contrib/grep/README.DRAGONFLY | 14 +-
gnu/usr.bin/grep/Makefile | 2 +-
gnu/usr.bin/grep/Makefile.inc0 | 14 -
gnu/usr.bin/grep/egrep/Makefile | 8 +-
gnu/usr.bin/grep/egrep/egrep | 11 +
gnu/usr.bin/grep/fgrep/Makefile | 8 +-
gnu/usr.bin/grep/fgrep/fgrep | 11 +
gnu/usr.bin/grep/grep/Makefile | 21 +-
gnu/usr.bin/grep/grep/grep.1 | 38 +-
gnu/usr.bin/grep/libgrep/Makefile | 21 -
gnu/usr.bin/grep/libgreputils/Makefile | 22 +-
gnu/usr.bin/grep/libgreputils/alloca.h | 2 +-
gnu/usr.bin/grep/libgreputils/config.h | 265 ++--
gnu/usr.bin/grep/libgreputils/configmake.h | 4 +-
gnu/usr.bin/grep/libgreputils/ctype.h | 359 ++++++
gnu/usr.bin/grep/libgreputils/dirent.h | 570 +++++++++
gnu/usr.bin/grep/libgreputils/fcntl.h | 18 +-
gnu/usr.bin/grep/libgreputils/getopt.h | 4 +-
gnu/usr.bin/grep/libgreputils/iconv.h | 422 +++++++
gnu/usr.bin/grep/libgreputils/inttypes.h | 1452 +++++++++++++++++++++
gnu/usr.bin/grep/libgreputils/langinfo.h | 478 +++++++
gnu/usr.bin/grep/libgreputils/locale.h | 528 ++++++++
gnu/usr.bin/grep/libgreputils/stdio.h | 1665 +++++++++++++++++++++++++
gnu/usr.bin/grep/libgreputils/stdlib.h | 1276 +++++++++++++++++++
gnu/usr.bin/grep/libgreputils/string.h | 1341 ++++++++++++++++++++
gnu/usr.bin/grep/libgreputils/sys/stat.h | 8 +-
gnu/usr.bin/grep/libgreputils/sys/time.h | 525 ++++++++
gnu/usr.bin/grep/libgreputils/sys/types.h | 54 +
gnu/usr.bin/grep/libgreputils/unistd.h | 1869 ++++++++++++++++++++++++++++
gnu/usr.bin/grep/libgreputils/unistr.h | 2 +-
gnu/usr.bin/grep/libgreputils/unitypes.h | 2 +-
gnu/usr.bin/grep/libgreputils/uniwidth.h | 2 +-
gnu/usr.bin/grep/libgreputils/wchar.h | 1340 ++++++++++++++++++++
gnu/usr.bin/grep/libgreputils/wctype.h | 816 ++++++++++++
35 files changed, 12996 insertions(+), 180 deletions(-)

Upstream: gitweb.dragonflybsd.org


  • Share