
Foreword
At a high level, HBase is like the atomic bomb. Its basic operation can be explained on the back of a napkin over a drink (or two). Its deployment is another matter.
HBase is composed of multiple moving parts. The distributed HBase application is made up of client and server processes. Then there is the Hadoop Distributed File System (HDFS) to which HBase persists. HBase uses yet another distributed system, Apache ZooKeeper, to manage its cluster state. Most deployments throw in Map-Reduce to assist with bulk loading or running distributed full-table scans. It can be tough to get all the pieces pulling together in any approximation of harmony.
Setting up the proper environment and configuration for HBase is critical. HBase is a general data store that can be used in a wide variety of applications. It ships with defaults that are conservatively targeted at a common use case and a generic hardware profile. Its ergonomic ability—its facility for self-tuning—is still under development, so you have to match HBase to the hardware and loading, and this configuration can take a couple of attempts to get right.