chapter seventeen

Chapter 17. Regular expressions

This chapter covers

Understanding regular expressions
Creating regular expressions with special characters
Using raw strings in regular expressions
Extracting matched text from strings
Substituting text with regular expressions

In some sense, we shouldn’t discuss regular expressions in this book at all. They’re implemented by a single Python module and are advanced enough that they don’t even come as part of the standard library in languages like C or Java. But if you’re using Python, you’re probably doing text parsing; and if you’re doing that, then regular expressions are too useful to be ignored. If you use Perl, Tcl, or UNIX, you may be familiar with regular expressions; if not, this chapter will go into them in some detail.

17.1. What is a regular expression?

A regular expression (RE) is a way of recognizing and often extracting data from certain patterns of text. A regular expression that recognizes a piece of text or a string is said to match that text or string. An RE is defined by a string in which certain of the characters (the so-called metacharacters) can have a special meaning, which enables a single RE to match many different specific strings..

It’s easier to understand this through example than through explanation. Here’s a program using a regular expression, which counts how many lines in a text file contain the word hello. A line that contains hello more than once will be counted only once:

Chapter 17. Regular expressions

This chapter covers

17.1. What is a regular expression?

17.2. Regular expressions with special characters

17.3. Regular expressions and raw strings

17.4. Extracting matched text from strings

17.5. Substituting text with regular expressions

17.6. Summary