2 Machine learning for regression

This chapter covers

Creating a car-price prediction project with a linear regression model
Doing an initial exploratory data analysis with Jupyter notebooks
Setting up a validation framework
Implementing the linear regression model from scratch
Performing simple feature engineering for the model
Keeping the model under control with regularization
Using the model to predict car prices

In chapter 1, we talked about supervised machine learning, in which we teach machine learning models how to identify patterns in data by giving them examples.

Suppose that we have a dataset with descriptions of cars, like make, model, and age, and we would like to use machine learning to predict their prices. These characteristics of cars are called features, and the price is the target variable—something we want to predict. Then the model gets the features and combines them to output the price.

This is an example of supervised learning: we have some information about the price of some cars, and we can use it to predict the price of others. In chapter 1, we also talked about different types of supervised learning: regression and classification. When the target variable is numerical, we have a regression problem, and when the target variable is categorical, we have a classification problem.

2.1 Car-price prediction project

2.1.1 Downloading the dataset

2.2 Exploratory data analysis

2.2.1 Exploratory data analysis toolbox

2.2.2 Reading and preparing data

2.2.3 Target variable analysis

2.2.4 Checking for missing values

2.2.5 Validation framework

2.3 Machine learning for regression

2.3.1 Linear regression

2.3.2 Training linear regression model

2.4 Predicting the price

2.4.1 Baseline solution

2.4.2 RMSE: Evaluating model quality

2.4.3 Validating the model

2.4.4 Simple feature engineering