Chapter 17. Regular expressions

 

This chapter covers

  • Understanding regular expressions
  • Creating regular expressions with special characters
  • Using raw strings in regular expressions
  • Extracting matched text from strings
  • Substituting text with regular expressions

In some sense, we shouldn’t discuss regular expressions in this book at all. They’re implemented by a single Python module and are advanced enough that they don’t even come as part of the standard library in languages like C or Java. But if you’re using Python, you’re probably doing text parsing; and if you’re doing that, then regular expressions are too useful to be ignored. If you use Perl, Tcl, or UNIX, you may be familiar with regular expressions; if not, this chapter will go into them in some detail.

17.1. What is a regular expression?

A regular expression (RE) is a way of recognizing and often extracting data from certain patterns of text. A regular expression that recognizes a piece of text or a string is said to match that text or string. An RE is defined by a string in which certain of the characters (the so-called metacharacters) can have a special meaning, which enables a single RE to match many different specific strings..

It’s easier to understand this through example than through explanation. Here’s a program using a regular expression, which counts how many lines in a text file contain the word hello. A line that contains hello more than once will be counted only once:

17.2. Regular expressions with special characters

17.3. Regular expressions and raw strings

17.4. Extracting matched text from strings

17.5. Substituting text with regular expressions

17.6. Summary