chapter eight

8 Understanding the consumption layer

 

This chapter covers

  • Semantic consistency across tools
  • Open interfaces such as JDBC, ODBC, Arrow Flight, and MCP
  • Evaluating BI tools, notebook environments, and AI platforms for integration
  • How to choose the right consumption tools

Now that your lakehouse has a solid foundation, from storage and ingestion to catalog and federation, it's time to focus on where data creates value: consumption. This is where your lakehouse architecture begins to yield insights, drive decisions, and power innovation. Whether you’re enabling real-time dashboards, supporting ad hoc data exploration in Python notebooks, or training large-scale machine learning models, the consumption layer bridges your technical investment with practical outcomes.

In traditional data architectures, consumption was often bound by the limitations of data movement, format compatibility, and tool lock-in. Accessing data meant replicating it into specialized databases, BI tools, etc., each with its own constraints. Apache Iceberg’s emphasis on openness and portability of table formats has reshaped this paradigm. Now, the data remains in place, and tools can come to the data, rather than the other way around. This shift dramatically reduces friction, empowering teams to bring their tool of choice without compromising governance, consistency, or performance.

8.1 Revisiting the benefits of the lakehouse for consumption

8.1.1 Connecting the lakehouse to the people

8.2 Revisiting requirements from our audit

8.2.1 Interpreting requirements for consumption

8.2.2 Requirements for BI tools

8.2.3 Requirements for interactive notebook environments

8.2.4 Requirements for AI and specialized data consumption tools

8.3 Open interfaces for seamless consumption

8.3.1 JDBC and ODBC

8.3.2 Arrow Flight

8.3.3 Model Context Protocol (MCP)

8.4 Business intelligence tools in the lakehouse

8.4.1 Open source BI tools

8.4.2 Commercial BI tools

8.5 Tools for AI and machine learning workloads

8.6 Choosing the right consumption tools: Ten illustrated scenarios

8.6.1 Startup with a data science focus

8.6.2 Large financial institution with strict governance

8.6.3 Mid-sized e-commerce platform building embedded analytics

8.6.4 Decentralized media organization enabling self-service analytics

8.6.5 Government agency balancing public transparency and internal control

8.6.6 Healthcare provider with compliance and data locality constraints

8.6.7 Logistics company unifying real-time operations and historical analysis

8.6.8 SaaS company offering customizable data access to clients

8.6.9 Nonprofit organization supporting collaborative research

8.6.10 Manufacturing company enabling predictive maintenance

8.7 Summary