concept density in category R

appears as: density, density, densities
Machine Learning with R, the tidyverse, and mlr

This is an excerpt from Manning's book Machine Learning with R, the tidyverse, and mlr.

Figure 4.10. Faceted plot of Survived against FamSize and Fare. Violin plots show the density of data along the y-axis. The lines on each violin represent the first quartile, median, and third quartile (from lowest to highest).

In the ggplot() function call, we supply Survived as the x aesthetic and Value as the y aesthetic (coercing it into a numeric vector with as.numeric() because it was converted into a character by our gather() function call earlier). Next—and here’s the cool bit—we ask ggplot2 to facet by the Variable column, using the facet_wrap() function, and allow the y-axis to vary between the facets. Faceting allows us to draw subplots of our data, indexed by some faceting variable. Finally, we add a violin geometric object, which is similar to a box plot but also shows the density of data along the y-axis. The resulting plot is shown in figure 4.10.

Figure 4.10. Faceted plot of Survived against FamSize and Fare. Violin plots show the density of data along the y-axis. The lines on each violin represent the first quartile, median, and third quartile (from lowest to highest).

In the last two chapters, we saw how k-means and hierarchical clustering identify clusters using distance: distance between cases, and distance between cases and their centroids. Density-based clustering comprises a set of algorithms that, as the name suggests, uses the density of cases to assign cluster membership. There are multiple ways of measuring density, but we can define it as the number of cases per unit volume of our feature space. Areas of the feature space containing many cases packed closely together can be said to have high density, whereas areas of the feature space that contain few or no cases can be said to have low density. Our intuition here states that distinct clusters in a dataset will be represented by regions of high density, separated by regions of low density. Density-based clustering algorithms attempt to learn these distinct regions of high density and partition them into clusters. Density-based clustering algorithms have several nice properties that circumvent some of the limitations of k-means and hierarchical clustering.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest