• google plus

Regular Expressions: How Greedy is Greedy?

According to Python's how-to documentation on regular expressions, "Repetitions such as * are greedy; when repeating a RE, the matching engine will try to repeat it as many times as possible. If later portions of the pattern don’t match, the matching engine will then back up and try again with fewer repetitions."

One way to think of this is that they are greedy, but not so greedy that it will throw away the whole match just to grab a few more characters for themselves.

To illustrate, consider this regular expression:

p = re.compile(r"([ab]*)bb")
p.search('aabbaabbaacc')

This matches 'aabbaabb'. The group matched by ([ab]*) is "aabbaa". How's it work?

  1. As a first check, the .* does go all the way through the string matching "a"s or "b"s until it gets to the last "a" or "b". It finds "aabbaabbaa", but since that is not followed with "bb", this check fails to match: aabbaabbaacc
  2. It then tries again, but this time drops off the last character. That also fails because the match isn't followed by a "bb". aabbaabbaacc
  3. And so it continues. From the beginning, it looks like this:
    1. aabbaabbaacc
    2. aabbaabbaacc
    3. aabbaabbaacc
    4. aabbaabbaacc
    5. aabbaabbaacc
  4. That last one matches: a series of "a"s or "b"s followed by "bb".
Author: Nat Dunn

Nat Dunn founded Webucator in 2003 to combine his passion for technical training with his business expertise and to help companies benefit from both. His previous experience was in sales, business and technical training, and management. Nat has an MBA from Harvard Business School and a BA in International Relations from Pomona College.

About Webucator

Webucator provides instructor-led training to students throughout the US and Canada. We have trained over 90,000 students from over 16,000 organizations on technologies such as Microsoft ASP.NET, Microsoft Office, Azure, Windows, Java, Adobe, Python, SQL, JavaScript, Angular and much more. Check out our complete course catalog.