Chapter 7. Classification, categorization, and tagging
In this chapter
- Learn the basic concepts behind classification, categorization, and tagging
- Discover how categorization is used in text applications
- Build, train, and evaluate classifiers using open source tools
- Integrate categorization into a search application
- Build a tag recommendation engine trained using tagged data
Chances are you’ve encountered keyword tags somewhere among the websites you’ve visited. Photos, videos, music, news or blog articles, and tweets are frequently accompanied by words and phrases that provide a quick description of the content you’re viewing and a link to related items. You’ve possibly seen tag clouds: displays of different-sized words displaying someone’s favorite discussion topics, movie genres, or musical styles. Tags are everywhere on the web and are used as navigation devices or to organize everything from news to bookmarks (see figure 7.1).
Figure 7.1. Tags used in a twitter post. Hashtags starting with the # character are words used to identify key words in a tweet, whereas tags referencing other users start with the @ character.

Tags are data about data, otherwise referred to as metadata. They can be applied to any sort of content and come in unstructured forms, from a simple list of relevant keywords or usernames to highly structured properties such as height, weight, and eye color.