chapter three

3 Hands-on with Apache Iceberg

 

This chapter covers

  • Setting up an Apache Iceberg environment
  • Creating Iceberg tables in Spark
  • Reading Iceberg tables in Dremio
  • Building a business intelligence dashboard

We’ve explored the key ideas behind an Apache Iceberg lakehouse, including its architecture, components, and tradeoffs. This chapter will take us from theory to hands-on work. Our goal isn’t just to build something that works, but to feel how a lakehouse comes together from end to end, so that, as we examine each component later in the book, you’ll have a concrete picture of how the pieces fit together.

In this chapter’s hands-on exercise, you’ll set up a working lakehouse environment on your laptop. You’ll start by configuring your environment and then create tables, run queries, and visualize results. Along the way, you’ll create Iceberg tables, query them using Dremio, and explore the data in a business intelligence (BI) dashboard. By the end of the chapter, you’ll have an environment you can experiment with and refer back to as we dig into the individual components in part 2.

3.1 Our example

3.2 Setting up an Apache Iceberg environment

3.2.1 Prerequisite: Installing Docker

3.2.2 Creating the Docker Compose file

3.2.3 Running the environment

3.2.4 Accessing services

3.3 Creating Iceberg tables in Spark

3.3.1 Populating the PostgreSQL database

3.3.2 Starting Apache Spark

3.3.3 Configuring Apache Spark for Iceberg

3.3.4 Loading data from PostgreSQL into Iceberg

3.3.5 Verifying data storage in MinIO

3.4 Reading Iceberg tables with Dremio

3.4.1 Starting Dremio

3.4.2 Connecting Dremio to the Nessie catalog

3.4.3 Querying Iceberg tables in Dremio

3.5 Creating a BI dashboard from your Iceberg tables

3.5.1 Starting Apache Superset

3.5.2 Connecting Superset to Dremio

3.5.3 Creating a dataset from Iceberg tables