chapter four

Chapter 4. HBase table design

This chapter covers

HBase schema design concepts
Mapping relational modeling knowledge to the HBase world
Advanced table definition parameters
HBase Filters to optimize read performance

In the first three chapters, you learned about interacting with HBase using the Java API and built a sample application to learn how to do so. As a part of building our TwitBase, you created tables in your HBase instance to store data in. The table definition was given to you, and you created the tables without going into the details of why you created them the way you did. In other words, we didn’t talk about how many column families to have, how many columns to have in a column family, what data should go into the column names and what should go into the cells, and so on. This chapter introduces you to HBase schema design and covers things that you should think about when designing schemas and rowkeys in HBase. HBase schemas are different from relational database schemas. They’re much simpler and provide a few things you can play with. Sometimes we refer to HBase as schema-less as well. But the simplicity gives you the ability to tweak it in order to extract optimal performance for your application’s access patterns. Some schemas may be great for writes, but when reading the same data back, these schemas may not perform as well, or vice versa.

4.1. How to approach schema design

4.2. De-normalization is the word in HBase land

Chapter 4. HBase table design

This chapter covers

4.1. How to approach schema design

4.2. De-normalization is the word in HBase land

4.3. Heterogeneous data in the same table

4.4. Rowkey design strategies

4.5. I/O considerations

4.6. From relational to non-relational

4.7. Advanced column family configurations

4.8. Filtering data

4.9. Summary