Chapter 17. Regular expressions
This chapter covers
- Understanding regular expressions
- Creating regular expressions with special characters
- Using raw strings in regular expressions
- Extracting matched text from strings
- Substituting text with regular expressions
In some sense, we shouldn’t discuss regular expressions in this book at all. They’re implemented by a single Python module and are advanced enough that they don’t even come as part of the standard library in languages like C or Java. But if you’re using Python, you’re probably doing text parsing; and if you’re doing that, then regular expressions are too useful to be ignored. If you use Perl, Tcl, or UNIX, you may be familiar with regular expressions; if not, this chapter will go into them in some detail.
A regular expression (RE) is a way of recognizing and often extracting data from certain patterns of text. A regular expression that recognizes a piece of text or a string is said to match that text or string. An RE is defined by a string in which certain of the characters (the so-called metacharacters) can have a special meaning, which enables a single RE to match many different specific strings..
It’s easier to understand this through example than through explanation. Here’s a program using a regular expression, which counts how many lines in a text file contain the word hello. A line that contains hello more than once will be counted only once: