concept lognormal distribution in category R

appears as: lognormal distribution, The lognormal distribution, lognormal distributions
Practical Data Science with R, Second Edition

This is an excerpt from Manning's book Practical Data Science with R, Second Edition.

1Recall from the discussion of the lognormal distribution in section 4.2 that it’s often useful to log transform monetary quantities. The log transform is also compatible with our original task of predicting incomes with a relative error (meaning large errors count more against small incomes). The glm() methods of section 7.2 can be used to avoid the log transform and predict in such a way as to minimize square errors (so being off by $50,000 would be considered the same error for both large and small incomes).

The lognormal distribution is the distribution of a random variable X whose natural log log(X) is normally distributed. The distribution of highly skewed positive data, like the value of profitable customers, incomes, sales, or stock prices, can often be modeled as a lognormal distribution. A lognormal distribution is defined over all non-negative real numbers; as shown in figure B.4 (top), it’s asymmetric, with a long tail out toward positive infinity. The distribution of log(X) (figure B.4, bottom) is a normal distribution centered at mean(log(X)). For lognormal populations, the mean is generally much higher than the median, and the bulk of the contribution toward the mean value is due to a small population of highest-valued data points.

Using the lognormal distribution in R

Let’s look at the functions for working with the lognormal distribution in R (see also section B.5.3). We’ll start with dlnorm() and rlnorm():

  • dlnorm(x, meanlog = m, sdlog = s) is the probability density function (PDF) that returns the probability of observing the value x when it’s drawn from a lognormal distribution X such that mean(log(X)) = m and sd(log(X)) = s. By default, meanlog = 0 and sdlog = 1 for all the functions discussed in this section.
  • rlnorm(n, meanlog = m, sdlog = s) is the random number that returns n values drawn from a lognormal distribution with mean(log(X)) = m and sd(log(X)) = s.
  • We can use dlnorm() and rlnorm() to produce figure 8.4, shown earlier. The following listing demonstrates some properties of the lognormal distribution.

    Listing B.5. Demonstrating some properties of the lognormal distribution
    # draw 1001 samples from a lognormal with meanlog 0, sdlog 1
    u <- rlnorm(1001)
    
    # the mean of u is higher than the median
    mean(u)
    # [1] 1.638628
    median(u)
    # [1] 1.001051
    
    # the mean of log(u) is approx meanlog=0
    mean(log(u))
    # [1] -0.002942916
    
    # the sd of log(u) is approx sdlog=1
    sd(log(u))
    # [1] 0.9820357
    
    # generate the lognormal with meanlog = 0, sdlog = 1
    x <- seq(from = 0, to = 25, length.out = 500)
    f <- dlnorm(x)
    
    # generate a normal with mean = 0, sd = 1
    x2 <- seq(from = -5, to = 5, length.out = 500)
    f2 <- dnorm(x2)
    
    # make data frames
    lnormframe <- data.frame(x = x, y = f)
    normframe <- data.frame(x = x2, y = f2)
    dframe <- data.frame(u=u)
    
    # plot densityplots with theoretical curves superimposed
    p1 <- ggplot(dframe, aes(x = u)) + geom_density() +
      geom_line(data = lnormframe, aes(x = x, y = y), linetype = 2)
    
    p2 <- ggplot(dframe, aes(x = log(u))) + geom_density() +
      geom_line(data = normframe, aes(x = x,y = y), linetype = 2)
    
    # functions to plot multiple plots on one page
    library(grid)
    nplot <- function(plist) {
      n <- length(plist)
      grid.newpage()
      pushViewport(viewport(layout=grid.layout(n, 1)))
      vplayout<-
         function(x,y) { viewport(layout.pos.row = x, layout.pos.col = y) }
      for(i in 1:n) {
        print(plist[[i]], vp = vplayout(i, 1))
      }
    }
    
    # this is the plot that leads this section.
    nplot(list(p1, p2))
    Figure B.5. The 75th percentile of the lognormal distribution with meanlog = 1, sdlog = 0
    sitemap

    Unable to load book!

    The book could not be loaded.

    (try again in a couple of minutes)

    manning.com homepage
    test yourself with a liveTest