Data science begins with data. We gather data. We prepare data. And we use data in predictive models. The better the data, the better the models, whether they draw on traditional statistics or machine learning.
By creating new data and measures from original data and measures, by manipulating data, we enhance model performance and efficiency. This is the work of feature engineering.
Visualizations help us learn from data. They demonstrate relationships among variables. They suggest useful transformations. They point to modeling problems, outliers, and unusual patterns in data.
Put these together—data preparation, feature engineering, and visualization—and you have the essence of Gary Sutton’s Statistics Slam Dunk: Statistical analysis with R on real NBA data. Drawing on many examples from professional basketball, Sutton provides a thorough introduction to exploratory data analysis for sports analytics and data science.
With its many packages and tools, R is the obvious choice for this book. We can expect Sutton’s work to be well-received by R enthusiasts, especially those who wish to transition from base R to tidyverse functions. The mechanics of R programming are well-illustrated in Statistics Slam Dunk.