8 String Transformations

 

This chapter covers

  • Understanding the finer details of strings and character vectors in R
  • Formatting, transforming, and printing character objects using several useful base R functions
  • Writing and using regular expressions to use as patterns for more advanced string-based applications
  • Using regex patterns with functions from the stringr package to work with strings and character vectors

Text is set of characters, strung together, giving us the term string. Working with strings is something you might do in R over and over again.  Because text may be very unstructured, working with it can be quite challenging. We need to understand the tools available for dealing with strings in base R and in the Tidyverse because strings play a big role in many data cleaning and preparation tasks. We’ll use small and large examples of text in our lessons here. The edr package provides a dataset for this chapter called resto_reviews. It’s not a tibble this time but rather a character vector containing reviews for a restaurant. We’ll occasionally use it to practice with a collection of string-based R functions. Some of these functions will be base R functions and others will come from the stringr package from the Tidyverse.

8.1       How Strings and Character Vectors Work in R

8.1.1   Making Simple Strings and Character Vectors

8.1.2   Strings in Data Frames and Tibbles

8.2       Different Ways to Format Text

8.2.1   Formatting Numbers to Strings with formatC()

8.2.2   Simple String Transformations with base R Functions

8.3       Using Regular Expressions to Work with Text

8.3.1   Regex Basics: Matching Characters and Using Escapes

8.4       Character Sets and Character Classes

8.5       Repetition, Laziness, and Greediness

8.6       Anchors

8.7       Grouping, Capturing, and Backreferences

8.8       Using Regular Expression in stringr Functions

8.9       Summary