2 Hands-on with Apache Iceberg

 

This chapter covers

  • Setting up an Apache Iceberg environment
  • Creating Iceberg tables in Spark
  • Reading Iceberg tables in Dremio
  • Building a business intelligence (BI) dashboard

We’ve explored the theory behind the Apache Iceberg lakehouse—its architecture, components, and benefits. Now, it's time to bring it to life with a hands-on exercise: set up a fully functional Iceberg lakehouse on your laptop.

This chapter will walk you through an end-to-end setup, from installing your environment to creating Iceberg tables, querying them with Dremio, and visualizing data in a Business Intelligence (BI) dashboard. By the end of this chapter, you’ll have a working Iceberg lakehouse that you can experiment with, giving you the confidence and foundation needed to design a production-scale implementation.

2.1 Setting up an Apache Iceberg environment

2.1.1 Prerequisites: Install Docker

2.1.2 Creating the Docker compose file

2.1.3 Running the environment

2.1.4 Accessing the services

2.2 Creating Iceberg tables in Spark

2.2.1 Populating the PostgreSQL database

2.2.2 Starting the Apache Spark environment

2.2.3 Configuring Apache Spark for Iceberg

2.2.4 Loading data from PostgreSQL into Iceberg

2.2.5 Verifying data storage in MinIO

2.3 Reading Iceberg tables in Dremio

2.3.1 Starting Dremio

2.3.2 Connecting Dremio to the Nessie Catalog

2.3.3 Querying Iceberg tables in Dremio

2.4 Creating a BI dashboard from your Iceberg tables

2.4.1 Starting Apache Superset

2.4.2 Connecting Superset to Dremio

2.4.3 Creating a dataset from Iceberg tables

2.4.4 Building charts and dashboards

2.5 Summary