Chapter 3. Modeling and prediction
This chapter covers
- Discovering relationships in data through ML modeling
- Using models for prediction and inference
- Building classification models
- Building regression models
The previous chapter covered guidelines and principles of data collection, preprocessing, and visualization. The next step in the machine-learning workflow is to use that data to begin exploring and uncovering the relationships that exist between the input features and the target. In machine learning, this process is done by building statistical models based on the data. This chapter covers the basics required to understand ML modeling and to start building your own models. In contrast to most machine-learning textbooks, we spend little time discussing the various approaches to ML modeling, instead focusing attention on the big-picture concepts. This will help you gain a broad understanding of machine-learning model building and quickly get up to speed on building your own models to solve real-world problems. For those seeking more information about specific ML modeling techniques, please see the appendix.
3.5. Terms from this chapter
Word |
Definition |
---|---|
model | The base product from using an ML algorithm on training data. |
prediction | Predictions are performed by pulling new data through the model. |
inference | The act of gaining insight into the data by building the model and not making predictions. |
(non)parametric | Parametric models make assumptions about the structure of the data. Nonparametric models don’t. |
(un)supervised | Supervised models, such as classification and regression, find the mapping between the input features and the target variable. Unsupervised models are used to find patterns in the data without a specified target variable. |
clustering | A form of unsupervised learning that puts data into self-defined clusters. |
dimensionality reduction | Another form of unsupervised learning that can map high-dimensional datasets to a lower-dimensional representation, usually for plotting in two or three dimensions. |
classification | A supervised learning method that predicts data into buckets. |
regression | The supervised method that predicts numerical target values. |