7 Designing the federation layer
This chapter covers
- Evaluating requirements for data federation
- Designing the federation layer components
- Comparing Dremio and Trino for federated querying
- Self-managed and cloud-managed federation options
- Selecting a federation platform based on use cases
As your Apache Iceberg lakehouse takes shape, it is important to recognize that not all data will reside within Iceberg tables. Despite best efforts to centralize and standardize, some datasets will remain scattered, locked in third-party systems, legacy databases, and SaaS applications, or simply not worth the effort of extracting, transforming, and loading into your lakehouse. These realities make it essential to extend your architecture with a federation layer.
The federation layer acts as both a bridge and a harmonizer. It enables your analytics platform to access data across multiple systems without physically consolidating it. At the same time, it possibly introduces a semantic layer that standardizes business logic, ensuring consistency in metrics and datasets regardless of their origin. Whether your analysts query data through notebooks, BI dashboards, or custom applications, the federation layer that possibly provides a unified and governed interface to the underlying data ecosystem, as illustrated in figure 7.1.