appendix-a

appendix A  The metadata tables

 

Apache Iceberg tracks rich metadata to efficiently manage its tables. This metadata isn’t just for internal bookkeeping; it’s also exposed to users through special metadata tables that can be queried like regular tables. These metadata tables provide visibility into the physical and logical layout of an Iceberg table, offering insights into data file sizes, partitions, snapshots, and more.

Understanding how to use these metadata tables is essential for monitoring table health, diagnosing performance issues, and automating maintenance tasks like compaction and snapshot expiration. This appendix provides an overview of these tables, demonstrates how to query them using common engines like Spark and Dremio, and explains how to interpret their outputs.

We’ll walk through each of Iceberg’s primary metadata tables and look at how they can be used in practical maintenance scenarios. We’ll also examine how these tables enable proactive optimization workflows, serving as a foundation for dynamic maintenance triggers.

A.1 Querying Iceberg metadata tables

A.2 The history metadata table

A.3 The snapshots metadata table

A.4 The metadata_log_entries metadata table

A.5 The manifests metadata table

A.6 The partitions metadata table

A.7 The files metadata table

A.8 The position_deletes metadata table

A.9 The all_data_files metadata table

A.10 The all_delete_files metadata table

A.11 The all_entries metadata table

A.12 The all_manifests metadata table

A.13 The refs metadata table

A.14 Monitoring table health with metadata tables

A.14.1 Example: Triggering compaction based on file metrics

A.14.2 Example: Monitoring snapshot frequency

A.14.3 Automating maintenance with insights