appendix-b

appendix B  Python for Apache Iceberg

 

Apache Iceberg has become a central standard for modern data lakehouses, and Python provides one of the most adaptable ecosystems for working with it. This appendix introduces practical ways to use Iceberg directly and indirectly through leading Python libraries and frameworks. Each section focuses on a single library, explains its connection to Iceberg, and includes step-by-step examples for both ETL and analytical workloads.

The goal is to show you how to build, manage, and analyze Iceberg data entirely in Python, without depending on JVM-based systems such as Spark. You’ll learn how to define schemas, create tables, append and overwrite data, and perform queries using tools like PyIceberg, Polars, DuckDB, Daft, PyDremio, Bauplan, and Spice AI.

Each tool plays a different role in the Python-Iceberg ecosystem:

  • PyIceberg provides direct, low-level access to Iceberg tables and catalogs.
  • Polars and DuckDB deliver high-performance, in-memory analytics on Iceberg data.
  • Daft adds distributed computation built on Apache Arrow.
  • Dremio provides comprehensive SQL support for Iceberg, with scalable query execution.
  • Bauplan extends Iceberg’s data model with Git-style branching and version control.
  • Spice AI enables federated queries and intelligent analytics over Iceberg datasets.

B.1 PyIceberg

B.2 Polars

B.3 DuckDB

B.4 Daft

B.5 Dremio

B.6 Bauplan

B.7 Spice AI

B.8 Summary and best practices