chapter eight

8 Designing the federation layer

 

This chapter covers

  • Evaluating requirements for data federation
  • Designing the federation layer components
  • Comparing Dremio and Trino for federated querying
  • Self-managed and cloud-managed federation options
  • Selecting a federation platform based on use cases

As your Apache Iceberg lakehouse takes shape, it’s important to recognize that not all data will reside in Iceberg tables. Despite your best efforts to centralize and standardize, some datasets will remain scattered, locked in third-party systems, legacy databases, and SaaS applications, or they may simply not be worth the effort to extract, transform, and load into your lakehouse. These realities make it essential to extend your architecture with a federation layer.

The federation layer acts as both a bridge and a harmonizer. It enables your analytics platform to access data across multiple systems without physically consolidating it. At the same time, it can introduce a semantic layer that standardizes business logic, ensuring consistency in metrics and datasets regardless of their origin. Whether your analysts query data through notebooks, BI dashboards, or custom applications, the federation layer provides a unified, governed interface to the underlying data ecosystem.

8.1 What data federation is and why it matters

8.1.1 Common use cases and challenges driving federation needs

8.1.2 How federation aligns with agility and accessibility

8.2 Key requirements for federation

8.2.1 Supporting diverse data sources without duplication

8.2.2 Ensuring consistent semantics and business logic

8.2.3 Providing seamless connectivity for analytics tools

8.3 Introducing Dremio and Trino

8.3.1 Dremio

8.3.2 Dremio’s architecture

8.3.3 Dremio’s connector ecosystem and Iceberg-centric focus

8.3.4 Dremio’s performance enhancements

8.3.5 Trino

8.3.6 Trino’s modular architecture for wide-source support

8.3.7 Trino’s flexibility and configurability for complex environments

8.3.8 Trino’s community-led evolution and vendor extensions

8.3.9 Semantic layer considerations in Trino

8.4 Deployment models