Chapter 18. Lambda Architecture in depth
This chapter covers
- Revisiting the Lambda Architecture
- Incremental batch processing
- Efficiently managing resources in batch workflows
- Merging logic between batch and realtime views
In chapter 1 you were introduced to the Lambda Architecture and its general-purpose approach for implementing any data system. Every chapter since then has dived into the details of the various components of the Lambda Architecture. As you’ve seen, there’s a lot involved in building Big Data systems that not only scale, but are robust and easy to understand as well.
Now that you’ve had a chance to dive into all the different layers of the Lambda Architecture, let’s use that newfound knowledge to review the Lambda Architecture once more and achieve a better understanding of it. We’ll fill in any remaining gaps and explore variations on the methodologies that have been discussed so far.
We started with a simple question: “What does a data system do?” The answer was also simple: a data system answers questions based on data you’ve seen in the past. Or put more formally, a data system computes queries that are functions of all the data you’ve ever seen. This is an intuitive definition that clearly encapsulates any data system you’d ever want to build:
There are a number of properties you’re concerned about with your queries: