Data science begins with data. We gather data. We prepare data. And we use data in predictive models. The better the data, the better the models, whether they draw on traditional statistics or machine learning.
By creating new data and measures from original data and measures, by manipulating data, we enhance model performance and efficiency. This is the work of feature engineering.
Visualizations help us learn from data. They demonstrate relationships among variables. They suggest useful transformations. They point to modeling problems, outliers, and unusual patterns in data.
Put these together—data preparation, feature engineering, and visualization—and you have the essence of Gary Sutton’s Statistics Slam Dunk: Statistical analysis with R on real NBA data. Drawing on many examples from professional basketball, Sutton provides a thorough introduction to exploratory data analysis for sports analytics and data science.