Part 2. Modeling methods
In part 1, we discussed the initial stages of a data science project. After you’ve defined more precisely the questions you want to answer and the scope of the problem you want to solve, it’s time to analyze the data and find the answers. In part 2, we work with powerful modeling methods from statistics and machine learning.
Chapter 5 covers how to identify appropriate modeling methods to address your specific business problem. It also discusses how to evaluate the quality and effectiveness of models that you or others have discovered. The remaining chapters in part 2 cover specific modeling techniques.
Chapter 6 covers what we call memorization-based techniques. These methods make predictions based primarily on summary statistics of your data. We cover lookup tables, nearest-neighbor methods, Naive Bayes classification, and decision trees. Chapter 7 covers methods that fit simple functions with additive functional structure: linear and logistic regression. These two methods not only make predictions, but also provide you with information about the relationship between the input variables and the outcome.