Capture Group in Regular Expressions

Capture groups are a fundamental concept in regular expressions, allowing for the extraction of specific parts of matched text. They are created by wrapping parts of the regex pattern in parentheses. When a regular expression with capture groups matches a string, the parts of the string that correspond to the capture groups can be retrieved using the re.Match.groups() method in Python.

Overview

Capture groups enable the extraction of specific submatches from a string. For instance, consider the regular expression (foo(bar)baz). In this pattern, foo(bar)baz is considered group 1, and bar is group 2. This structure allows you to capture both the entire match and specific submatches within the pattern.

Examples and Figures

Example of Capture Groups

In the following example, capture groups are used to extract parts of the string "chair":

>>> match = re.match(f'([{consonants}]+)(.+)', 'chair')
>>> match.groups()
('ch', 'air')

In this example, two capture groups are defined. The first group captures the leading consonant sound, and the second group captures everything that follows. This is illustrated in Figure 14.7.

[Figure 14.7](https://livebook.manning.com/tiny-python-projects/chapter-14/figure--14-7) We define two capture groups to access the leading consonant sound and whatever follows. Figure 14.7 We define two capture groups to access the leading consonant sound and whatever follows.

Failure to Match

When attempting to use the same pattern on a string that does not start with a consonant, such as "apple", the match fails because the first capture group cannot find a match. As a result, the entire match operation returns None.

Simplified Capture Group

A simpler example demonstrates capturing just the leading consonant sound:

>>> match = re.match(f'([{consonants}]+)', 'chair')
>>> match.groups()
('ch',)

This example shows a single capture group that successfully captures the leading consonant sound “ch” from the word "chair". The concept of adding parentheses to create capture groups is depicted in Figure 14.6.

[Figure 14.6](https://livebook.manning.com/tiny-python-projects/chapter-14/figure--14-6) Adding parentheses around a pattern causes the matching text to be available as a capture group. Figure 14.6 Adding parentheses around a pattern causes the matching text to be available as a capture group.

Conclusion

Capture groups are a powerful feature in regular expressions, allowing for the extraction and manipulation of specific parts of matched text. By using parentheses, you can define multiple capture groups within a single pattern, enabling complex text processing tasks.

FAQ (Frequently asked questions)

What method is used to recover parts of a regex match using capture groups?

What is the purpose of using capture groups in regular expressions?

What effect do parentheses have on a pattern in regular expressions?

What can capture groups in regular expressions be used for?

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest