chapter thirteen

13 Advanced transformations of data frames

 

This chapter covers

  • Performing advanced transformations of data frames and grouped data frames
  • Chaining transformation operations to create data processing pipelines
  • Sorting, joining, and reshaping data frames
  • Working with categorical data
  • Evaluating classification models

In chapter 12 you learned how to perform basic transformations of data frames using operation-specification syntax using the combine function. In this chapter you will learn more advanced scenarios about how you can use this syntax, along with more functions that accept it: select, select!, transform, transform!, subset, and subset!. With these functions, you can conveniently perform any operation on columns you would need. At the same time, these functions are optimized for speed, and optionally can use multiple threads to perform computations. As in chapter 12, I also show you how you can specify these transformations using DataFramesMeta.jl domain-specific language.

In this chapter you will also learn how you can combine multiple tables using join operations. DataFrames.jl has an efficient implementation for all standard joins: inner join, left and right joins, outer join, semi and anti joins, and cross join. Similarly, I will show you how you can reshape data frames with stack and unstack functions.

13.1  Getting and pre-processing the police stop data set

13.1.1 Loading all required packages

13.1.2 Introducing the @chain macro

13.1.3 Getting the police stop data set

13.1.4 Comparison of functions that perform operations on columns

13.1.5 Short forms of operation specification syntax

13.2  Investigating the violation column

13.2.1 Finding most frequent violations

13.2.2 Vectorizing functions using the the ByRow wrapper

13.2.3 Flattening data frames

13.2.4 Convenience syntax for getting number of rows of a data frame

13.2.5 Sorting data frames

13.2.6 Advanced functionalities of the DataFramesMeta.jl package

13.3  Preparing data for making predictions

13.3.1 Initial transformation of the data

13.3.2 Working with categorical data

13.3.3 Joining data frames

13.3.4 Reshaping data frames

13.3.5 Dropping rows of a data frame that hold missing values

13.4  Building a predictive model of arrest probability