13 Advanced transformations of data frames

 

This chapter covers

  • Performing advanced transformations of data frames and grouped data frames
  • Chaining transformation operations to create data processing pipelines
  • Sorting, joining, and reshaping data frames
  • Working with categorical data
  • Evaluating classification models

In chapter 12, you learned how to perform basic transformations of data frames by using operation specification syntax with the combine function. In this chapter, you will learn more advanced scenarios for using this syntax, along with more functions that accept it: select, select!, transform, transform!, subset, and subset!. With these functions, you can conveniently perform any operation you need on columns. At the same time, these functions are optimized for speed, and optionally can use multiple threads to perform computations. As in chapter 12, I also show you how to specify these transformations by using the DataFramesMeta.jl domain-specific language.

In this chapter, you will also learn to combine multiple tables by using join operations. DataFrames.jl has an efficient implementation for all standard joins: inner joins, left and right joins, outer joins, semi and anti joins, and cross joins. Similarly, I will show you how to reshape data frames with the stack and unstack functions.

13.1 Getting and preprocessing the police stop data set

13.1.1 Loading all required packages

13.1.2 Introducing the @chain macro

13.1.3 Getting the police stop data set

13.1.4 Comparing functions that perform operations on columns

13.1.5 Using short forms of operation specification syntax

13.2 Investigating the violation column

13.2.1 Finding the most frequent violations

13.2.2 Vectorizing functions by using the ByRow wrapper

13.2.3 Flattening data frames

13.2.4 Using convenience syntax to get the number of rows of a data frame