chapter fourteen

14 Fitting functions to data

This chapter covers

Measuring how closely a function models a data set
Exploring spaces of functions determined by constants
Using gradient descent to optimize the quality of “fit”
Modeling data sets with different kinds of functions

The calculus techniques you learned in part 2 require well-behaved functions to be applicable. For a derivative to exist, a function needs to be sufficiently smooth, and to calculate an exact derivative or integral, you need a function to have a simple formula. For most real-world data, we aren’t so lucky. Due to randomness or measurement error, we rarely come across perfectly smooth functions in the wild. In this chapter, we cover how to take messy data and model it with a simple mathematical function−a task called regression.

I’ll walk you through an example on a real data set, consisting of 740 used cars listed for sale on the website CarGraph.com. These cars are all Toyota Priuses, and they all have mileage and sale price reported. Plotting this data on a scatter plot, figure 14.1 shows that we can see there’s a downward trend in price as mileage increases. This reflects that cars lose value as they are driven. Our goal is to come up with a simple function that describes how the price of a used Prius changes as its mileage increases.

Figure 14.1 A plot of price vs. mileage for used Toyota Priuses listed for sale on CarGraph.com

14.1 Measuring the quality of fit for a function

14.1.1 Measuring distance from a function

14.1.2 Summing the squares of the errors

14 Fitting functions to data

This chapter covers

Figure 14.1 A plot of price vs. mileage for used Toyota Priuses listed for sale on CarGraph.com

14.1 Measuring the quality of fit for a function

14.1.1 Measuring distance from a function

14.1.2 Summing the squares of the errors

14.1.3 Calculating cost for car price functions

14.1.4 Exercises

14.2 Exploring spaces of functions

14.2.1 Picturing cost for lines through the origin

14.2.2 The space of all linear functions

14.2.3 Exercises

14.3 Finding the line of best fit using gradient descent

14.3.1 Rescaling the data

14.3.2 Finding and plotting the line of best fit

14.3.3 Exercises