4 Fitting a linear regression

 

This chapter covers

  • Model fitting
  • Model evaluation
  • Model assumption tests
  • Data exploration, testing for normality, and detecting outliers

Linear regression is a supervised learning method, meaning it uses labeled data—where the input features and corresponding outputs are known for a subset of data—to predict a quantitative response from one or more independent variables. This model is then applied to make predictions on new, unknown data. Although linear regression might now lack the spark of random forests (see chapter 5) and other more contemporary methods, it’s still an “implement” at or near the top of every data scientist’s toolbox. Furthermore, linear regression is a foundational model that is easy both to understand and to implement, with virtually endless use cases. Here are just a few examples:

  • Marketing organizations predicting sales revenues based on advertising expenditures across multiple channels
  • Buyers and lenders predicting future home prices based on size and square footage, number of bedrooms and bathrooms, age and condition of the property, architectural design, and lot size
  • University admissions officers predicting student performance based on high school GPAs and standardized test scores
  • Retailers predicting product demand based on historical sales figures

4.1 Primer on linear regression

4.1.1 Linear equation

4.1.2 Goodness of fit

4.1.3 Conditions for best fit

4.2 Simple linear regression

4.2.1 Importing and exploring the data

4.2.2 Fitting the model

4.2.3 Interpreting and evaluating the results

4.2.4 Testing model assumptions

Summary