Chapter 5. Extending HBase with coprocessors

This chapter covers

Coprocessors and how to use them effectively
Types of coprocessors: observer and endpoint
How to configure and validate coprocessor installation on your cluster

Everything you’ve seen of HBase as an online system is centered on data access. The five HBase commands introduced in chapter 2 are exclusively for reading or writing data. For the HBase cluster, the most computationally expensive portion of any of those operations occurs when applying server-side filters on Scan results. Even so, this computation is extremely specific to accessing the data. You can use custom filters to push application logic onto the cluster, but filters are constrained to the context of a single row. To perform computation over your data in HBase, you’re forced to rely on Hadoop MapReduce or on custom client code that will read, modify, and write data back to HBase.

HBase coprocessors are an addition to our data-manipulation toolset that were introduced as a feature in HBase in the 0.92.0 release. With the introduction of coprocessors, we can push arbitrary computation out to the HBase nodes hosting our data. This code is run in parallel across all the RegionServers. This transforms an HBase cluster from horizontally scalable storage to a highly capable, distributed, data-storage and -processing system.

Chapter 5. Extending HBase with coprocessors

This chapter covers

5.1. The two kinds of coprocessors

5.2. Implementing an observer

5.3. Implementing an endpoint

5.4. Summary