Chapter 1. Introducing HBase

This chapter covers

The origins of Hadoop, HBase, and NoSQL
Common use cases for HBase
A basic HBase installation
Storing and querying data with HBase

HBase is a database: the Hadoop database. It’s often described as a sparse, distributed, persistent, multidimensional sorted map, which is indexed by rowkey, column key, and timestamp. You’ll hear people refer to it as a key value store, a column family-oriented database, and sometimes a database storing versioned maps of maps. All these descriptions are correct. But fundamentally, it’s a platform for storing and retrieving data with random access, meaning you can write data as you like and read it back again as you need it. HBase stores structured and semistructured data naturally so you can load it with tweets and parsed log files and a catalog of all your products right along with their customer reviews. It can store unstructured data too, as long as it’s not too large. It doesn’t care about types and allows for a dynamic and flexible data model that doesn’t constrain the kind of data you store.

HBase isn’t a relational database like the ones to which you’re likely accustomed. It doesn’t speak SQL or enforce relationships within your data. It doesn’t allow interrow transactions, and it doesn’t mind storing an integer in one row and a string in another for the same column.

Chapter 1. Introducing HBase

This chapter covers

1.1. Data-management systems: a crash course

1.2. HBase use cases and success stories

1.3. Hello HBase

1.4. Summary