Chapter 8. Scaling GIS on HBase

 

This chapter covers

  • Adapting HBase to the challenge of indexing multidimensional data
  • Applying domain knowledge in your schema design
  • Real-world application of custom Filters

In this chapter, we’ll use HBase to tackle a new domain: Geographic Information Systems (GIS). GIS is an interesting area of exploration because it poses two significant challenges: latency at scale and modeling spatial locality. We’ll use the lens of GIS to demonstrate how to adapt HBase to tackle these challenges. To do so, you’ll need to use domain-specific knowledge to your advantage.

8.1. Working with geographic data

Geographic systems are frequently used as the foundation of an online, interactive user experience. Consider a location-based service, such as Foursquare, Yelp, or Urban Spoon. These services strive to provide relevant information about hundreds of millions of locations all over the globe. Users of these applications depend on them to find, for instance, the nearest coffee shop in an unfamiliar neighborhood. They don’t want a MapReduce job standing between them and their latte. We’ve already discussed HBase as a platform for online data access, so this first constraint seems a reasonable match for HBase. Still, as you’ve seen in previous chapters, HBase can only provide low request latency when your schema is designed to use the physical storage of data. This brings you conveniently to the second challenge: spatial locality.

8.2. Designing a spatial index

8.3. Implementing the nearest-neighbors query

8.4. Pushing work server-side

8.5. Summary