3 Principles of curve fitting

 

This chapter covers

  • How to fit a parametric model
  • What a loss function is and how to use it
  • Linear regression, the mother of all neural networks
  • Gradient descent as a tool to optimize a loss function
  • Implementing gradient descent with different frameworks

DL models became famous because they outperformed traditional machine learning (ML) methods in a broad variety of relevant tasks such as computer vision and natural language processing. From the previous chapter, you already know that a critical success factor of DL models is their deep hierarchical architecture. DL models have millions of tunable parameters, and you might wonder how to tune these so that the models behave optimally. The solution is astonishingly simple. It’s already used in many methods in traditional ML: you first define a loss function that describes how badly a model performs on the training data and then tune the parameters of the model to minimize the loss. This procedure is called fitting.

3.1 “Hello world” in curve fitting

3.1.1  Fitting a linear regression model based on a loss function

3.2 Gradient descent method

3.2.1 Loss with one free model parameter

3.2.2 Loss with two free model parameters

3.3 Special DL sauce

3.3.1 Mini-batch gradient descent

3.3.2 Using SGD variants to speed up the learning

3.3.3 Automatic differentiation

3.4 Backpropagation in DL frameworks

3.4.1 Static graph frameworks

3.4.2 Dynamic graph frameworks

Summary