appendix-c

Appendix C The Apache Iceberg specification

 

Apache Iceberg is more than an open table format; it is a specification with clearly defined rules for how table metadata, snapshots, partitioning, and schema evolution behave. These specifications ensure consistent behavior across tools and engines, enabling interoperability, reliability, and long-term stability. As Iceberg evolves, each new version of the spec introduces new capabilities while balancing backward compatibility.

This appendix provides a concise but thorough reference to the Iceberg specification. It begins by explaining the purpose and structure of the specification, then walks through format versions 1, 2, and 3, highlighting the major features introduced in each. You’ll also find guidance on how metadata, snapshots, and concurrency work under the hood, along with coverage of the REST Catalog API and Puffin file format specification.

Whether you're implementing your own engine integration or evaluating compatibility between tools, this appendix will help you understand what guarantees Iceberg makes, and how they continue to evolve.

C.1 Understanding the Iceberg specification

C.1.1 What is a table format specification?

C.1.2 Why Iceberg formalizes table behavior

C.1.3 Evolution of the spec: versioning principles and compatibility

C.2 Iceberg table format versions

C.2.1 Version 1: Foundation for analytical tables

C.2.2 Version 2: Row-level deletes and stricter writes

C.2.3 Version 3: Extended types and advanced capabilities

C.2.4 Version 4: Performance, portability, and real-time readiness

C.3 Snapshot management and table metadata

C.3.1 Table metadata files

C.3.2 Snapshots and the manifest list

C.3.3 Sequence numbers and optimistic concurrency

C.4 The REST Catalog specification

C.4.1 Overview and purpose

C.4.2 Catalog configuration and default endpoints

C.4.3 Namespaces, tables, and views

C.4.4 Table registration, metrics, and transactions

C.4.5 OAuth2 support and security considerations

C.4.6 The scan planning endpoint

C.5 Puffin file format specification

C.5.1 What is a Puffin file?

C.5.2 Storing column-level metrics and custom indexes

C.5.3 Integration with Iceberg table metadata

C.6 Compatibility and migration

C.6.1 Reading and writing across format versions

C.6.2 Upgrading tables to newer spec versions

C.6.3 Handling backward compatibility in practice