Chapter 16. Regular expressions

 

This chapter covers

  • Understanding regular expressions
  • Creating regular expressions with special characters
  • Using raw strings in regular expressions
  • Extracting matched text from strings
  • Substituting text with regular expressions

Some might wonder why I’m discussing regular expressions in this book at all. Regular expressions are implemented by a single Python module and are advanced enough that they don’t even come as part of the standard library in languages like C or Java. But if you’re using Python, you’re probably doing text parsing; if you’re doing that, regular expressions are too useful to be ignored. If you’ve used Perl, Tcl, or Linux/UNIX, you may be familiar with regular expressions; if not, this chapter goes into them in some detail.

16.1. What is a regular expression?

A regular expression (regex) is a way of recognizing and often extracting data from certain patterns of text. A regex that recognizes a piece of text or a string is said to match that text or string. A regex is defined by a string in which certain characters (the so-called metacharacters) can have a special meaning, which enables a single regex to match many different specific strings.

It’s easier to understand this through example than through explanation. Here’s a program with a regular expression that counts how many lines in a text file contain the word hello. A line that contains hello more than once is counted only once:

16.2. Regular expressions with special characters

16.3. Regular expressions and raw strings

16.4. Extracting matched text from strings

16.5. Substituting text with regular expressions

Summary

sitemap