chapter nine

9 Getting data from a data frame

This chapter covers

Subsetting rows of a data frame
Selecting columns of a data frame
Creating local linear regression (LOESS) models
Visualizing LOESS predictions

In chapter 8, you learned the basic principles of working with data frames in Julia provided by the DataFrames.jl package, and we started to analyze the Lichess chess puzzle data. Recall that our objective was to identify the relationship between puzzle difficulty and popularity.

In section 8.3, we stopped our investigation by concluding that we would like to clean the original data before performing its final analysis (in figure 9.1, I reproduce the histograms we used in chapter 8 to conclude that the original data is significantly skewed). The simplest form of data cleaning is removing the unwanted observations. Therefore, in this chapter, you will learn how to get data from a data frame by subsetting its rows and selecting columns.

Our goal in this chapter is to check the relationship between the puzzle difficulty and how much users like it. To perform this analysis, we will take the following steps:

Subset the data set to concentrate only on columns and rows that we want to analyze later.
Aggregate data about the relationship between puzzle difficulty and popularity in a data frame and plot it.
Build a local linear regression (LOESS) to obtain better summary information about relationships present in the data.

9.1 Advanced data frame indexing

9 Getting data from a data frame

This chapter covers

9.1 Advanced data frame indexing

9.1.1 Getting a reduced puzzles data frame

9.1.2 Overview of allowed column selectors

9.1.3 Overview of allowed row-subsetting values

9.1.4 Making views of data frame objects

9.2 Analyzing the relationship between puzzle difficulty and popularity

9.2.1 Calculating mean puzzle popularity by its rating

9.2.2 Fitting LOESS regression

Summary