concept `lognormal distribution` in category `R`

appears as: lognormal distribution, The lognormal distribution, lognormal distributions

Practical Data Science with R, Second Edition

This is an excerpt from Manning's book Practical Data Science with R, Second Edition. Login to get full access to this book.

¹Recall from the discussion of the lognormal distribution in section 4.2 that it’s often useful to log transform monetary quantities. The log transform is also compatible with our original task of predicting incomes with a relative error (meaning large errors count more against small incomes). The glm() methods of section 7.2 can be used to avoid the log transform and predict in such a way as to minimize square errors (so being off by $50,000 would be considered the same error for both large and small incomes).

to see more go to 7.1.2. Building a linear regression model

The lognormal distribution is the distribution of a random variable X whose natural log log(X) is normally distributed. The distribution of highly skewed positive data, like the value of profitable customers, incomes, sales, or stock prices, can often be modeled as a lognormal distribution. A lognormal distribution is defined over all non-negative real numbers; as shown in figure B.4 (top), it’s asymmetric, with a long tail out toward positive infinity. The distribution of log(X) (figure B.4, bottom) is a normal distribution centered at mean(log(X)). For lognormal populations, the mean is generally much higher than the median, and the bulk of the contribution toward the mean value is due to a small population of highest-valued data points.

to see more go to Appendix B. Important statistical concepts

Using the lognormal distribution in R

Let’s look at the functions for working with the lognormal distribution in R (see also section B.5.3). We’ll start with dlnorm() and rlnorm():

dlnorm(x, meanlog = m, sdlog = s) is the probability density function (PDF) that returns the probability of observing the value x when it’s drawn from a lognormal distribution X such that mean(log(X)) = m and sd(log(X)) = s. By default, meanlog = 0 and sdlog = 1 for all the functions discussed in this section.

rlnorm(n, meanlog = m, sdlog = s) is the random number that returns n values drawn from a lognormal distribution with mean(log(X)) = m and sd(log(X)) = s.

We can use dlnorm() and rlnorm() to produce figure 8.4, shown earlier. The following listing demonstrates some properties of the lognormal distribution.
Listing B.5. Demonstrating some properties of the lognormal distribution
# draw 1001 samples from a lognormal with meanlog 0, sdlog 1
u <- rlnorm(1001)

# the mean of u is higher than the median
mean(u)
# [1] 1.638628
median(u)
# [1] 1.001051

# the mean of log(u) is approx meanlog=0
mean(log(u))
# [1] -0.002942916

# the sd of log(u) is approx sdlog=1
sd(log(u))
# [1] 0.9820357

# generate the lognormal with meanlog = 0, sdlog = 1
x <- seq(from = 0, to = 25, length.out = 500)
f <- dlnorm(x)

# generate a normal with mean = 0, sd = 1
x2 <- seq(from = -5, to = 5, length.out = 500)
f2 <- dnorm(x2)

# make data frames
lnormframe <- data.frame(x = x, y = f)
normframe <- data.frame(x = x2, y = f2)
dframe <- data.frame(u=u)

# plot densityplots with theoretical curves superimposed
p1 <- ggplot(dframe, aes(x = u)) + geom_density() +
  geom_line(data = lnormframe, aes(x = x, y = y), linetype = 2)

p2 <- ggplot(dframe, aes(x = log(u))) + geom_density() +
  geom_line(data = normframe, aes(x = x,y = y), linetype = 2)

# functions to plot multiple plots on one page
library(grid)
nplot <- function(plist) {
  n <- length(plist)
  grid.newpage()
  pushViewport(viewport(layout=grid.layout(n, 1)))
  vplayout<-
     function(x,y) { viewport(layout.pos.row = x, layout.pos.col = y) }
  for(i in 1:n) {
    print(plist[[i]], vp = vplayout(i, 1))
  }
}

# this is the plot that leads this section.
nplot(list(p1, p2))
copy

to see more go to Appendix B. Important statistical concepts

Figure B.5. The 75th percentile of the lognormal distribution with meanlog = 1, sdlog = 0

to see more go to Appendix B. Important statistical concepts

concept lognormal distribution in category R

Practical Data Science with R, Second Edition

Using the lognormal distribution in R

Listing B.5. Demonstrating some properties of the lognormal distribution

Figure B.5. The 75th percentile of the lognormal distribution with meanlog = 1, sdlog = 0

Unable to load book!

concept `lognormal distribution` in category `R`

Figure B.5. The 75th percentile of the lognormal distribution with `meanlog = 1`, `sdlog = 0`