Regular Expressions: How Greedy is Greedy?

  • google plus

In Brief...

According to Python's how-to documentation on regular expressions, "Repetitions such as * are greedy; when repeating a RE, the matching engine will try to repeat it as many times as possible. If later portions of the pattern don’t match, the matching engine will then back up and try again with fewer repetitions."

Take our Advanced Python Training course for free.

See the Course Outline and Register

Instructions

One way to think of this is that they are greedy, but not so greedy that it will throw away the whole match just to grab a few more characters for themselves.

To illustrate, consider this regular expression:

p = re.compile(r"([ab]*)bb")
p.search('aabbaabbaacc')

This matches 'aabbaabb'. The group matched by ([ab]*) is "aabbaa". How's it work?

  1. As a first check, the .* does go all the way through the string matching "a"s or "b"s until it gets to the last "a" or "b". It finds "aabbaabbaa", but since that is not followed with "bb", this check fails to match: aabbaabbaacc
  2. It then tries again, but this time drops off the last character. That also fails because the match isn't followed by a "bb". aabbaabbaacc
  3. And so it continues. From the beginning, it looks like this:
    1. aabbaabbaacc
    2. aabbaabbaacc
    3. aabbaabbaacc
    4. aabbaabbaacc
    5. aabbaabbaacc
  4. That last one matches: a series of "a"s or "b"s followed by "bb".

More Learning Resources

Author: Nat Dunn

Nat Dunn founded Webucator in 2003 to combine his passion for technical training with his business expertise and to help companies benefit from both. His previous experience was in sales, business and technical training, and management. Nat has an MBA from Harvard Business School and a BA in International Relations from Pomona College.

Discuss