chapter twelve
                    Our goal is to extract locations from disease-related headlines to uncover the largest active epidemics within and outside of the United States. We will do as follows:
- Load the data.
 - Extract locations from the text using regular expressions and the GeoNamesCache library.
 - Check the location matches for errors.
 - Cluster the locations based on geographic distance.
 - Visualize the clusters on a map, and remove any errors.
 - Output representative locations from the largest clusters to draw interesting conclusions.