2 Quantifiers and special sub-patterns
Solving the puzzles in this chapter will require you to have a good understanding of the different quantifiers that regular expressions provide, and to pay careful attention to when you should use subpatterns (themselves likely quantified). If you feel rusty about quantifiers or the wildcard character, reviewing the appendix to this book is a good idea.
In a general, but only approximate, way, the chapters of this book build from simpler to more complex capabilities of regular expressions. Using quantifiers is one of the most fundamental capabilities within the mini-language of regexen, so this chapter begins with puzzles that mostly rely on those. Later chapters mix in additional constructs and build on the puzzles of this chapter.
Puzzle 1 Wildcard scope
Summary
Match all and only words that start with x and end with y.
A powerful element of Python regular expression syntax—shared by many other regex engines—is the option of creating either “greedy” or “non-greedy” matches. The former matches as much as it possibly can, as long as it finds the later part of a pattern. The latter matches as little as it possibly can to reach the next part of a pattern.
Suppose you have these two regular expressions:
pat1 = re.compile(r'x.*y') #1 pat2 = re.compile(r'x.*?y') #2
And also the following block of text that you want to match. You can think of it as a sort of lorem ipsum that only has X words, if you will: