concept l2 norm in category R

This is an excerpt from Manning's book Machine Learning with R, the tidyverse, and mlr.
What does this penalty look like that we add to the least squares estimate? Two penalties are frequently used: the L1 norm and the L2 norm. I’ll start by showing you what the L2 norm is and how it works, because this is the regularization method used in ridge regression. Then I’ll extend this to show you how LASSO uses the L1 norm method, and how elastic net combines both the L1 and L2 norms.
In this section, I’ll show you a mathematical and graphical explanation of the L2 norm, how ridge regression uses it, and why you would use it. Imagine that you want to predict how busy your local park will be, depending on the temperature that day. An example of what this data might look like is shown in figure 11.4.
Ridge regression modifies the least squares loss function slightly to include a term that makes the function’s value larger, the larger the parameter estimates are. As a result, the algorithm now has to balance selecting the model parameters that minimize the sum of squares, and selecting parameters than minimize this new penalty. In ridge regression, this penalty is called the L2 norm, and it is very easy to calculate: we simply square all of the model parameters and add them up (all except the intercept). When we have only one continuous predictor, we have only one parameter (the slope), so the L2 norm is its square. When we have two predictors, we square the slopes for each and then add these squares together, and so on. This is illustrated for our park example in figure 11.5.