In this chapter, we’ll demonstrate how to fit regression models, namely, multiple linear regressions and regression trees. Our dependent, or target, variable will be regular season wins, and our independent variables, or predictors, will be the full complement of hustle statistics that the NBA began recording during the 2016-17 season. These statistics include but aren’t limited to blocked shots, deflections, and loose balls recovered. Hence, we’ll be regressing wins against an order of hustle statistics.
Our hypothesis is that at least some of these hustle statistics have a meaningful influence on wins and losses, but which hustle statistics? And by how much? Following a thorough exploration of the data—during which we’ll be laser-focused on identifying and treating outliers, testing for normal distributions, and computing correlation coefficients—we’ll fit a multiple linear regression as a first test and then fit a regression tree as a second test.