Chapter 10. Grouping unlabeled items using k-means clustering

 

This chapter covers

  • The k-means clustering algorithm
  • Cluster postprocessing
  • Bisecting k-means
  • Clustering geographic points

The 2000 and 2004 presidential elections in the United States were close—very close. The largest percentage of the popular vote that any candidate received was 50.7% and the lowest was 47.9%. If a percentage of the voters were to have switched sides, the outcome of the elections would have been different. There are small groups of voters who, when properly appealed to, will switch sides. These groups may not be huge, but with such close races, they may be big enough to change the outcome of the election.[1] How do you find these groups of people, and how do you appeal to them with a limited budget? The answer is clustering.

1 For details on how microtargeting was used successfully in the 2004 U.S. presidential campaign, see Fournier, Sosnik, and Dowd, Applebee’s America (Simon & Schuster, 2006).

Let me tell you how it’s done. First, you collect information on people either with or without their consent: any sort of information that might give some clue about what is important to them and what will influence how they vote. Then you put this information into some sort of clustering algorithm. Next, for each cluster (it would be smart to choose the largest one first) you craft a message that will appeal to these voters. Finally, you deliver the campaign and measure to see if it’s working.

10.1. The k-means clustering algorithm

 
 
 

10.2. Improving cluster performance with postprocessing

 
 
 

10.3. Bisecting k-means

 
 
 

10.4. Example: clustering points on a map

 
 

10.5. Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest