chapter three

3 Principles of curve fitting

This chapter covers

How to fit a parametric model
What a loss function is and how to use it
Linear regression, the mother of all neural networks
Gradient descent as a tool to optimize a loss function
Implementing gradient descent with different frameworks

DL models became famous because they outperformed traditional machine learning (ML) methods in a broad variety of relevant tasks such as computer vision and natural language processing. From the previous chapter, you already know that a critical success factor of DL models is their deep hierarchical architecture. DL models have millions of tunable parameters, and you might wonder how to tune these so that the models behave optimally. The solution is astonishingly simple. It’s already used in many methods in traditional ML: you first define a loss function that describes how badly a model performs on the training data and then tune the parameters of the model to minimize the loss. This procedure is called fitting.

3.1 “Hello world” in curve fitting

3.1.1 Fitting a linear regression model based on a loss function

3.2 Gradient descent method

3.2.1 Loss with one free model parameter

3 Principles of curve fitting

This chapter covers

3.1 “Hello world” in curve fitting

3.1.1 Fitting a linear regression model based on a loss function

3.2 Gradient descent method

3.2.1 Loss with one free model parameter

3.2.2 Loss with two free model parameters

3.3 Special DL sauce

3.3.1 Mini-batch gradient descent

3.3.2 Using SGD variants to speed up the learning

3.3.3 Automatic differentiation

3.4 Backpropagation in DL frameworks

3.4.1 Static graph frameworks

3.4.2 Dynamic graph frameworks

Summary