concept `DBSCAN` in category `R`

appears as: DBSCAN, DBSCAN

Machine Learning with R, the tidyverse, and mlr

This is an excerpt from Manning's book Machine Learning with R, the tidyverse, and mlr. Login to get full access to this book.

If the correct number of clusters is difficult for you to determine, it could be there simply aren’t well-defined clusters in the data, or you may need to do further exploration, including generating more data. It may be worth trying a different clustering method: for example, one that doesn’t find spherical clusters like k-means does, or one which can exclude outliers (like DBSCAN, which you’ll meet in chapter 18).

to see more go to 16.2.5. Training the final, tuned k-means model

If the separation of cases into a noise cluster isn’t desirable for your application (but using DBSCAN or OPTICS is), you can use a heuristic method like classifying noise points based on their nearest cluster centroid, or adding them to the cluster of their k-nearest neighbors.

All three of these advantages can be seen in figure 18.1. The three subplots each show the same data, clustered using either DBSCAN, k-means (Hartigan-Wong algorithm), or hierarchical clustering (complete linkage). This dataset is certainly strange, and you might think you’re unlikely to encounter real-world data like it, but it illustrates the advantages of density-based clustering over k-means and hierarchical clustering. The clusters in the data have very different shapes (that certainly aren’t spherical) and diameters. While k-means and hierarchical clustering learn clusters that bisect and merge these real clusters, DBSCAN is able to faithfully find each shape as a distinct cluster. Additionally, notice that k-means and hierarchical clustering place every single case into a cluster. DBSCAN creates the cluster “0” into which it places any cases it considers to be noise. In this case, all cases outside of those geometrically shaped clusters are placed into the noise cluster. If you look carefully, though, you may notice a sine wave in the data that all three fail to identify as a cluster.

Figure 18.1. A challenging clustering problem. The dataset shown in each facet contains clusters of varying shapes and diameters, with cases that could be considered noise. The three subplots show the data clustered using DBSCAN, hierarchical clustering (complete linkage), and k-means (Hartigan-Wong). Of the three algorithms used, only DBSCAN is able to faithfully represent these shapes as distinct clusters.

to see more go to Chapter 18. Clustering based on density: DBSCAN and OPTICS

Use dbscan() to cluster our unscaled data:

swissDbsUnscaled <- dbscan::dbscan(swissTib, eps = 1.2, minPts = 9)

swissDbsUnscaled

# The clusters are not the same as those learned for the scaled data.
# This is because DBSCAN and OPTICS are sensitive to scale differences.

swissDbsUnscaled <- dbscan::dbscan(swissTib, eps = 1.2, minPts = 9)

swissDbsUnscaled

# The clusters are not the same as those learned for the scaled data.
# This is because DBSCAN and OPTICS are sensitive to scale differences.

to see more go to 18.4. Strengths and weaknesses of density-based clustering

concept DBSCAN in category R

Machine Learning with R, the tidyverse, and mlr

Unable to load book!

concept `DBSCAN` in category `R`