chapter five

Chapter 5. Classification: placing things where they belong

This chapter covers:

Understanding classification techniques based on probabilities and rules
Automatically categorizing email messages
Detecting fraudulent financial transactions with neural networks

“What is this?” is the question children perhaps ask most frequently. The popularity of that question among children—whose inquisitive nature is as wonderful as it is persistent—shouldn’t be surprising. In order to understand the world around us, we organize our perceptions into groups and categories (labeled groups, possibly structured). In the previous chapter, we presented a number of clustering algorithms that can help us group data points together. In this chapter, we’ll present a number of classification algorithms that’ll help us assign each data point to an appropriate category, also referred to as a class (hence the term classification). The act of classification would answer a child’s question by providing a statement in the form “This is a boat,” “This is a tree,” “This is a house,” and so on. Classification relies on a priori reference structures that divide the space of all possible data points into a set of classes that are usually, but not necessarily, nonoverlapping. Contrast this with the arbitrary nature of the clusters that we described in the previous chapter.

5.1. The need for classification

5.2. An overview of classifiers

Chapter 5. Classification: placing things where they belong

This chapter covers:

5.1. The need for classification

5.2. An overview of classifiers

5.3. Automatic categorization of emails and spam filtering

5.4. Fraud detection with neural networks

5.5. Are your results credible?

5.6. Classification with very large datasets

5.7. Summary

5.8. To do

5.9. References