7 Number go up! (or down) Correlation and linear regression
This chapter covers
- The Pearson correlation and how it serves as a hypothesis test for a linear relationship between two variables.
- How to predict values of correlated variables using linear regression
- Metrics and assumptions for validating correlation and linear regression models
Linear regression is a type of statistical (and machine learning) model that fits a linear function between independent (input) and dependent (output) variables given some data. This way, a line fit to the data can be used to predict on data not seen before, assuming there is indeed a linear relationship between the variables. So far, we have only focused on one variable at a time. But it can be helpful to predict or understand hypothesized relationships between multiple variables, such as how much growth a plant will experience given so many hours of sunlight. Sometimes these relationships happen to resemble a straight line pattern, which can be helpful in making predictions straightforward. Linear relationships may sound elementary and basic, but they are a foundational part of even the most advanced models in statistics and machine learning. Therefore, it’s a great building block to master!
Linear regression has many strengths, making it a workhorse for many statistical and machine learning models: