Chapter 3. Distributed HBase, HDFS, and MapReduce

 

This chapter covers

  • HBase as a distributed storage system
  • When to use MapReduce instead of the key-value API
  • MapReduce concepts and workflow
  • How to write MapReduce applications with HBase
  • How to use HBase for map-side joins in MapReduce
  • Examples of using HBase with MapReduce

As you’ve realized, HBase is built on Apache Hadoop. What may not yet be clear to you is why. Most important, what benefits do we, as application developers, enjoy from this relationship? HBase depends on Hadoop for two separate concerns. Hadoop MapReduce provides a distributed computation framework for high-throughput data access. The Hadoop Distributed File System (HDFS) gives HBase a storage layer providing availability and reliability. In this chapter, you’ll see how Twit-Base is able to take advantage of this data access for bulk processing and how HBase uses HDFS to guarantee availability and reliability.

3.1. A case for MapReduce

 

3.2. An overview of Hadoop MapReduce

 
 

3.3. HBase in distributed mode

 

3.4. HBase and MapReduce

 
 

3.5. Putting it all together

 
 
 
 

3.6. Availability and reliability at scale

 
 
 
 

3.7. Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage