3 Principles of curve fitting

This chapter covers:

How to fit a parametric model
What is a loss function and how to use it
Linear regression, the mother of all neural networks
Gradient descent as a tool to optimize a loss function
Implementing gradient descent with different frameworks

DL models became famous because they outperformed traditional machine learning methods in a broad variety of relevant tasks such as computer vision and natural language processing. From the previous chapter, you already know that a critical success factor of DL models is their deep hierarchical architecture. DL models have millions of tunable parameters, and you might wonder how to tune these parameters so that the model behaves optimally. The solution is astonishingly simple and already used in many methods in traditional machine learning: you first define a loss function which describes how badly a model performs on the training data and then tune the parameters of the model to minimize the loss. This procedure is called fitting.

3.1 “Hello world” in curve fitting

3.1.1 Fitting a linear regression model based on a loss function

3.2 Gradient descent method

3.2.1 Loss with one free model parameter

3.2.2 Loss with two free-model parameters

3.3 Special DL sauce

3.3.1 Mini-batch gradient descent

3.3.2 Using SGD variants to speed up the learning

3.3.3 Automatic differentiation

3.4 Backpropagation in DL frameworks

3.4.1 Static graph frameworks

3.4.2 Dynamic graph frameworks

3.5 Summary