chapter seven

Chapter 7. Batch layer: Illustration

This chapter covers

Sources of complexity in data-processing code
JCascalog as a practical implementation of pipe diagrams
Applying abstraction and composition techniques to data processing

In the last chapter you saw how pipe diagrams are a natural and concise way to specify computations that operate over large amounts of data. You saw that pipe diagrams can be executed as a series of MapReduce jobs for parallelism and scalability.

In this illustration chapter, we’ll look at a tool that’s a fairly direct mapping of pipe diagrams: JCascalog. There’s a lot to cover in JCascalog, so this chapter is a lot more involved than the previous illustration chapters. Like always, you can still learn the full theory of the Lambda Architecture without reading the illustration chapters. But with JCascalog, in particular, we aim to open your minds as to what is possible with data-processing tools. A key point is that your data-processing code is no different than any other code you write. As such, it requires good abstractions that are reusable and composable. Abstraction and composition are the cornerstones of good software engineering.

Chapter 7. Batch layer: Illustration

This chapter covers

7.1. An illustrative example

7.2. Common pitfalls of data-processing tools

7.3. An introduction to JCascalog

7.4. Composition

7.5. Summary