chapter six

6 Working with text data

This chapter covers:

Removing whitespace from strings
Altering the casing of strings
Replacing characters in a string
Slicing a string on index positions
Splitting a string on occurrences of a delimiter

Real world data is often messy. Datasets are riddled with whitespace, improper characters, incorrect casings and more. One of the primary inspirations for the creation of Pandas was to ease the difficulty of cleaning up these improperly formatted values. This process of smoothing data into an optimal shape before analysis is called wrangling or munging. In this chapter, we'll explore the powerful methods available within the library to efficiently clean up text data.

6.1 String Casing

Let's begin by importing Pandas into our Jupyter Notebook.

In  [1] import pandas as pd

This chapter's first dataset is a listing of 150,000+ food inspections in the city of Chicago, Illinois. It includes two columns, one with the name of each establishment and the other with a risk ranking. Let's take a look.

6 Working with text data

This chapter covers:

6.1 String Casing

6.2 String Slicing

6.2.1 String Slicing and Character Replacement

6.3 Boolean Methods

6.4 Splitting Strings

6.5 Coding Challenge

6.6 A Note on Regular Expressions

6.7 Summary