concept learning rate in category Keras

This is an excerpt from Manning's book Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability MEAP V06.
Figure 3.5 The loss from equation 3.1 plotted versus the free regression model parameter a. It shows the results of 5 steps of gradient descent, starting with a = 0.5 with different learning rates. With a learning rate of 0.0001, the minimum is approximately reached in 5 steps without overshooting. With a learning rate of 0.0003, the minimum is also reached after approximately 5 steps but overshooting the position of the minimum twice. With a learning rate of 0.00045, the updates of a always overshoot the position of the minimum. In this case, the updated values for a are more and more apart from the minimum. The corresponding loss grows without bounds.
![]()
In Figure 3.5 you can see that the learning rate is a critical hyper parameter. If you choose a value too small, a lot of update steps are needed to find the optimal model parameter. However, if the learning rate is too large (see figure 3.5, right panel), it’s impossible to converge to the position of the optimal parameter value a for which the loss is minimal. If the learning rate is too large, the loss increases with each update. This leads to numerical problems, resulting in NaNs or infinity. When you observe that the loss gets infinity or NaN, there’s a saying that goes “keep calm and lower your learning rate.” So the next time you see that your loss in the training set gets higher and higher instead of decreasing, try lowering the learning rate (dividing it by 10 is a good guess to begin with).