If you check the man page for grep, you'll see that it says 'This is experimental and grep -P may warn of unimplemented features.' under the section for '-P'. For this reason, I have decided to avoid them whenever I can in the future and always use '-P' from now on. The formal specification for 'Basic Regular Expressions' does not even support the '+' or '?' quantifiers or '|' for alternation! 'Extended Regular Expressions' does support '+', '?' and '|', but it has no support for the concept of 'greedy' or 'lazy'. The other thing I realized from reviewing this document was that both BRE and ERE regular expressions don't support as many features as I thought they did. The following statement in section 9.4.6 EREs Matching Multiple Characters seems to suggest that the behaviour I saw was not actually a bug, but rather 'undefined behaviour': The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results. Therefore, I reviewed the document The Open Group Base Specifications Isedition which appears to be a formal specification for BRE and ERE. It became clear that what I really needed to do was consult the formal specifications for 'Basic Regular Expressions' and 'Extended Regular Expressions'. After reading through the source code and its comments such as the following: /* In BRE consecutive duplications are not allowed. Naturally, I had to dig into what was going on here, so I reviewed the source code for grep to see if I had possibly found a bug. ERE (Extended Regular Expressions) != Perl Compatible Regular Expressions In other words, most people would probably expect the output to look just like the output from using the '-P' flag. The above result is not expected since the regex that was specified should only match up to a maximum (and minimum) of 3 'a' characters, and then print each match on a separate line. Which outputs the following (not expected): aaaaaaaaaaaaaaaaa As a starting point, the following command will instruct grep to print out any lines that contain at least one sequence of exactly 3 'a' characters: echo " aaaaaaaaaaaaaaaaa " | grep -P " a+ " Specifically, my goal was to verify my understanding of regular expression quantifiers by testing out a few examples with grep. I came to this conclusion while doing some research for my Guide To Regular Expressions. Unfortunately, the -P flag is not supported by all implementations of grep, so this may not always be possible and it will become ever more important to be mindful of the behaviour described in this article. This article will attempt to convince the reader that it is almost never a good idea to use the '-E' flag with grep (for Extended Regular Expressions), and that you should instead use the '-P' flag (when possible) for Perl-compatible regular expressions. Undefined Behaviour With Grep -E - By Robert Elder
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |