6 Working with text data
This chapter covers:
- Removing whitespace from strings
- Altering the casing of strings
- Replacing characters in a string
- Slicing a string on index positions
- Splitting a string on occurrences of a delimiter
Real world data is often messy. Datasets are riddled with whitespace, improper characters, incorrect casings and more. One of the primary inspirations for the creation of Pandas was to ease the difficulty of cleaning up these improperly formatted values. This process of smoothing data into an optimal shape before analysis is called wrangling or munging. In this chapter, we'll explore the powerful methods available within the library to efficiently clean up text data.
6.1 String Casing
Let's begin by importing Pandas into our Jupyter Notebook.
In [1] import pandas as pd
This chapter's first dataset is a listing of 150,000+ food inspections in the city of Chicago, Illinois. It includes two columns, one with the name of each establishment and the other with a risk ranking. Let's take a look.