In chapters 8-11, you learned to create data frames and extract data from them. It is time to discuss ways in which data frames can be mutated. By data frame mutation, I mean creating new columns by using data from existing columns. For example, you might have a date column in a data frame and want to create a new column that stores the year extracted from this date. In DataFrames.jl, you can achieve this objective in two ways:
- Update the source data frame in place by adding a new column to it.
- Create a new data frame storing only the columns that you will later need in your data analysis pipeline.
This chapter covers both approaches. Data frame mutation is a fundamental step in all data science projects. As discussed in chapter 1, after ingesting the source data, you need to prepare it before it can be analyzed for insights. This data preparation process typically involves such tasks as data cleaning and transforming, which are usually achieved by mutating existing columns of a data frame.