concept training loss in category deep learning

This is an excerpt from Manning's book Deep Learning with PyTorch.
The training loss will tell us if our model can fit the training set at all--in other words, if our model has enough capacity to process the relevant information in the data. If our mysterious thermometer somehow managed to measure temperatures using a logarithmic scale, our poor linear model would not have had a chance to fit those measurements and provide us with a sensible conversion to Celsius. In that case, our training loss (the loss we were printing in the training loop) would stop decreasing well before approaching zero.
A deep neural network can potentially approximate complicated functions, provided that the number of neurons, and therefore parameters, is high enough. The fewer the number of parameters, the simpler the shape of the function our network will be able to approximate. So, rule 1: if the training loss is not decreasing, chances are the model is too simple for the data. The other possibility is that our data just doesn’t contain meaningful information that lets it explain the output: if the nice folks at the shop sell us a barometer instead of a thermometer, we will have little chance of predicting temperature in Celsius from just pressure, even if we use the latest neural network architecture from Quebec (www.umontreal.ca/en/artificialintelligence).
# In[14]: def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c): for epoch in range(1, n_epochs + 1): train_t_p = model(train_t_u, *params) #1 train_loss = loss_fn(train_t_p, train_t_c) val_t_p = model(val_t_u, *params) #1 val_loss = loss_fn(val_t_p, val_t_c) optimizer.zero_grad() train_loss.backward() #2 optimizer.step() if epoch <= 3 or epoch % 500 == 0: print(f"Epoch {epoch}, Training loss {train_loss.item():.4f}," f" Validation loss {val_loss.item():.4f}") return params # In[15]: params = torch.tensor([1.0, 0.0], requires_grad=True) learning_rate = 1e-2 optimizer = optim.SGD([params], lr=learning_rate) training_loop( n_epochs = 3000, optimizer = optimizer, params = params, train_t_u = train_t_un, #3 val_t_u = val_t_un, #3 train_t_c = train_t_c, val_t_c = val_t_c) # Out[15]: Epoch 1, Training loss 66.5811, Validation loss 142.3890 Epoch 2, Training loss 38.8626, Validation loss 64.0434 Epoch 3, Training loss 33.3475, Validation loss 39.4590 Epoch 500, Training loss 7.1454, Validation loss 9.1252 Epoch 1000, Training loss 3.5940, Validation loss 5.3110 Epoch 1500, Training loss 3.0942, Validation loss 4.1611 Epoch 2000, Training loss 3.0238, Validation loss 3.7693 Epoch 2500, Training loss 3.0139, Validation loss 3.6279 Epoch 3000, Training loss 3.0125, Validation loss 3.5756 tensor([ 5.1964, -16.7512], requires_grad=True)Here we are not being entirely fair to our model. The validation set is really small, so the validation loss will only be meaningful up to a point. In any case, we note that the validation loss is higher than our training loss, although not by an order of magnitude. We expect a model to perform better on the training set, since the model parameters are being shaped by the training set. Our main goal is to also see both the training loss and the validation loss decreasing. While ideally both losses would be roughly the same value, as long as the validation loss stays reasonably close to the training loss, we know that our model is continuing to learn generalized things about our data. In figure 5.14, case C is ideal, while D is acceptable. In case A, the model isn’t learning at all; and in case B, we see overfitting. We’ll see more meaningful examples of overfitting in chapter 12.
Figure 5.14 Overfitting scenarios when looking at the training (solid line) and validation (dotted line) losses. (A) Training and validation losses do not decrease; the model is not learning due to no information in the data or insufficient capacity of the model. (B) Training loss decreases while validation loss increases: overfitting. (C) Training and validation losses decrease exactly in tandem. Performance may be improved further as the model is not at the limit of overfitting. (D) Training and validation losses have different absolute values but similar trends: overfitting is under control.
![]()