Chapter 16. Regular expressions
This chapter covers
- Understanding regular expressions
- Creating regular expressions with special characters
- Using raw strings in regular expressions
- Extracting matched text from strings
- Substituting text with regular expressions
Some might wonder why I’m discussing regular expressions in this book at all. Regular expressions are implemented by a single Python module and are advanced enough that they don’t even come as part of the standard library in languages like C or Java. But if you’re using Python, you’re probably doing text parsing; if you’re doing that, regular expressions are too useful to be ignored. If you’ve used Perl, Tcl, or Linux/UNIX, you may be familiar with regular expressions; if not, this chapter goes into them in some detail.
A regular expression (regex) is a way of recognizing and often extracting data from certain patterns of text. A regex that recognizes a piece of text or a string is said to match that text or string. A regex is defined by a string in which certain characters (the so-called metacharacters) can have a special meaning, which enables a single regex to match many different specific strings.
It’s easier to understand this through example than through explanation. Here’s a program with a regular expression that counts how many lines in a text file contain the word hello. A line that contains hello more than once is counted only once: