
DL models became famous because they outperformed traditional machine learning (ML) methods in a broad variety of relevant tasks such as computer vision and natural language processing. From the previous chapter, you already know that a critical success factor of DL models is their deep hierarchical architecture. DL models have millions of tunable parameters, and you might wonder how to tune these so that the models behave optimally. The solution is astonishingly simple. It’s already used in many methods in traditional ML: you first define a loss function that describes how badly a model performs on the training data and then tune the parameters of the model to minimize the loss. This procedure is called fitting.