8 Designing the federation layer
This chapter covers
- Evaluating requirements for data federation
- Designing the federation layer components
- Comparing Dremio and Trino for federated querying
- Self-managed and cloud-managed federation options
- Selecting a federation platform based on use cases
As your Apache Iceberg lakehouse takes shape, it’s important to recognize that not all data will reside in Iceberg tables. Despite your best efforts to centralize and standardize, some datasets will remain scattered, locked in third-party systems, legacy databases, and SaaS applications, or they may simply not be worth the effort to extract, transform, and load into your lakehouse. These realities make it essential to extend your architecture with a federation layer.
The federation layer acts as both a bridge and a harmonizer. It enables your analytics platform to access data across multiple systems without physically consolidating it. At the same time, it can introduce a semantic layer that standardizes business logic, ensuring consistency in metrics and datasets regardless of their origin. Whether your analysts query data through notebooks, BI dashboards, or custom applications, the federation layer provides a unified, governed interface to the underlying data ecosystem.