chapter eighteen

Chapter 18. Lambda Architecture in depth

This chapter covers

Revisiting the Lambda Architecture
Incremental batch processing
Efficiently managing resources in batch workflows
Merging logic between batch and realtime views

In chapter 1 you were introduced to the Lambda Architecture and its general-purpose approach for implementing any data system. Every chapter since then has dived into the details of the various components of the Lambda Architecture. As you’ve seen, there’s a lot involved in building Big Data systems that not only scale, but are robust and easy to understand as well.

Now that you’ve had a chance to dive into all the different layers of the Lambda Architecture, let’s use that newfound knowledge to review the Lambda Architecture once more and achieve a better understanding of it. We’ll fill in any remaining gaps and explore variations on the methodologies that have been discussed so far.

18.1. Defining data systems

We started with a simple question: “What does a data system do?” The answer was also simple: a data system answers questions based on data you’ve seen in the past. Or put more formally, a data system computes queries that are functions of all the data you’ve ever seen. This is an intuitive definition that clearly encapsulates any data system you’d ever want to build:

query = function(all data)

Chapter 18. Lambda Architecture in depth

This chapter covers

18.1. Defining data systems

18.2. Batch and serving layers

18.3. Speed layer

18.4. Query layer

18.5. Summary