chapter one

1 The world of the Apache Iceberg Lakehouse

This chapter covers

What a data lakehouse is and how it differs from traditional data architectures
How Apache Iceberg shapes the lakehouse paradigm
When and why you should implement an Apache Iceberg lakehouse

The evolution of data architecture has been shaped by a constant struggle to balance performance, cost, and flexibility while ensuring data remains accessible and governed. Over the years, businesses have cycled through various approaches—data warehouses (analytics-optimized databases), data lakes (analytics on files stored on distributed storage), and hybrid solutions—each attempting to solve the challenges of scaling analytics, reducing complexity, and controlling costs.

1.1 What is a data lakehouse

1.1.1 The rise of data warehouses

1.1.2 The move to cloud data warehouses

1.1.3 The data lake and the Hadoop era

1.1.4 Apache Iceberg: The key to the data lakehouse

1.1.5 The data lakehouse: the best of both worlds

1.2 What is Apache Iceberg?

1.2.1 The need for a table format

1.2.2 How Apache Iceberg manages metadata

1.2.3 Key features of Apache Iceberg

1.2.4 Apache Iceberg as an open-source standard

1.3 The benefits of Apache Iceberg

1.3.1 ACID transactions

1.3.2 Table evolution

1.3.3 Time travel & snapshot-based queries

1.3.4 Hidden partitioning for reduced accidental full-table scans

1.3.5 Cost efficiency & optimized query performance

1.4 The components of an Apache Iceberg lakehouse

1.4.1 The storage layer: The foundation of your lakehouse

1.4.2 The ingestion layer: Feeding data into Iceberg tables

1.4.3 The catalog layer: The entry point to your lakehouse

1.4.4 The federation layer: Modeling & accelerating data

1.4.5 The consumption layer: Delivering value to the business

1.5 Summary