Chapter 8. Unsupervised methods
This chapter covers
- Using R’s clustering functions to explore data and look for similarities
- Choosing the right number of clusters
- Evaluating a clustering
- Using R’s association rules functions to find patterns of co-occurrence in data
- Evaluating a set of association rules
The methods that we’ve discussed in previous chapters build models to predict outcomes. In this chapter, we’ll look at methods to discover unknown relationships in data. These methods are called unsupervised methods. With unsupervised methods, there’s no outcome that you’re trying to predict; instead, you want to discover patterns in the data that perhaps you hadn’t previously suspected. For example, you may want to find groups of customers with similar purchase patterns, or correlations between population movement and socioeconomic factors. Unsupervised analyses are often not ends in themselves; rather, they’re ways of finding relationships and patterns that can be used to build predictive models. In fact, we encourage you to think of unsupervised methods as exploratory—procedures that help you get your hands in the data—rather than as black-box approaches that mysteriously and automatically give you “the right answer.”
In this chapter, we’ll look at two classes of unsupervised methods. Cluster analysis finds groups in your data with similar characteristics. Association rule mining finds elements or properties in the data that tend to occur together.