chapter three

Chapter 3. Distributed HBase, HDFS, and MapReduce

This chapter covers

HBase as a distributed storage system
When to use MapReduce instead of the key-value API
MapReduce concepts and workflow
How to write MapReduce applications with HBase
How to use HBase for map-side joins in MapReduce
Examples of using HBase with MapReduce

As you’ve realized, HBase is built on Apache Hadoop. What may not yet be clear to you is why. Most important, what benefits do we, as application developers, enjoy from this relationship? HBase depends on Hadoop for two separate concerns. Hadoop MapReduce provides a distributed computation framework for high-throughput data access. The Hadoop Distributed File System (HDFS) gives HBase a storage layer providing availability and reliability. In this chapter, you’ll see how Twit-Base is able to take advantage of this data access for bulk processing and how HBase uses HDFS to guarantee availability and reliability.

3.1. A case for MapReduce

Chapter 3. Distributed HBase, HDFS, and MapReduce

This chapter covers

3.1. A case for MapReduce

3.2. An overview of Hadoop MapReduce

3.3. HBase in distributed mode

3.4. HBase and MapReduce

3.5. Putting it all together

3.6. Availability and reliability at scale

3.7. Summary