In chapter 8, you learned the basic principles of working with data frames in Julia provided by the DataFrames.jl package, and we started to analyze the Lichess chess puzzle data. Recall that our objective was to identify the relationship between puzzle difficulty and popularity.
In section 8.3, we stopped our investigation by concluding that we would like to clean the original data before performing its final analysis (in figure 9.1, I reproduce the histograms we used in chapter 8 to conclude that the original data is significantly skewed). The simplest form of data cleaning is removing the unwanted observations. Therefore, in this chapter, you will learn how to get data from a data frame by subsetting its rows and selecting columns.
Our goal in this chapter is to check the relationship between the puzzle difficulty and how much users like it. To perform this analysis, we will take the following steps:
- Subset the data set to concentrate only on columns and rows that we want to analyze later.
- Aggregate data about the relationship between puzzle difficulty and popularity in a data frame and plot it.
- Build a local linear regression (LOESS) to obtain better summary information about relationships present in the data.