3 Preparing for your move to Apache Iceberg

 

This chapter covers

  • Performing an infrastructure audit
  • Engaging stakeholders to surface technical and organizational needs
  • Documenting current tooling, storage systems, and governance practices
  • Converting audit findings into prioritized, actionable requirements
  • Considerations for storage, ingestion, catalog, federation, and data consumption

Implementing Apache Iceberg is not simply a technical upgrade. It's a shift in how your organization handles data, from ingestion and storage to governance and analytics. While Chapter 2 gave you hands-on exposure to Iceberg’s capabilities, moving to production requires more than experimentation. It demands a structured understanding of your current environment.

Every organization’s data landscape is unique. You may be using different file formats, legacy ETL tools, or navigating region-specific compliance constraints, and attempting to integrate Iceberg into this environment without first conducting an audit. This approach risks introducing inefficiencies, unexpected costs, or governance gaps. A thoughtful audit helps you avoid these pitfalls by turning a broad set of options into a focused strategy.

3.1 Conducting your data platform audit

3.1.1 Who are the stakeholders?

3.1.2 What should you ask stakeholders?

3.1.3 Conducting a technological audit

3.2 Hamerliva Bank’s audit in action

3.2.1 Hamerliva Bank interviews their stakeholders

3.2.2 Hamerliva Bank audits its technology

3.2.3 Hamerliva Bank summarizes its audit findings

3.3 From audit to requirements: Laying the foundation for design

3.3.1 Defining storage requirements

3.3.2 Defining ingestion requirements

3.3.3 Defining catalog requirements

3.3.4 Defining federation requirements

3.3.5 Defining consumption requirements

3.3.6 Hamerliva Bank establishes its requirements

3.4 Architectural plan and road show

3.4.1 Hamerliva Bank creates its architectural plan

3.4.2 Hamerliva Bank conducts a road show

3.5 Summary