chapter one

Chapter 1. Introducing Storm

This chapter covers

What Storm is
The definition of big data
Big data tools
How Storm fits into the big data picture
Reasons for using Storm

Apache Storm is a distributed, real-time computational framework that makes processing unbounded streams of data easy. Storm can be integrated with your existing queuing and persistence technologies, consuming streams of data and processing/transforming these streams in many ways.

Still following us? Some of you are probably feeling smart because you know what that means. Others are searching for the proper animated GIF to express your level of frustration. There’s a lot in that description, so if you don’t grasp what all of it means right now, don’t worry. We’ve devoted the remainder of this chapter to clarifying exactly what we mean.

To appreciate what Storm is and when it should be used, you need to understand where Storm falls within the big data landscape. What technologies can it be used with? What technologies can it replace? Being able to answer questions like these requires some context.

1.1. What is big data?

To talk about big data and where Storm fits within the big data landscape, we need to have a shared understanding of what “big data” means. There are a lot of definitions of big data floating around. Each has its own unique take. Here’s ours.

1.1.1. The four Vs of big data

Big data is best understood by considering four different properties: volume, velocity, variety, and veracity.^[1]

Chapter 1. Introducing Storm

This chapter covers

1.1. What is big data?

1.1.1. The four Vs of big data

1.2. How Storm fits into the big data picture

1.3. Why you’d want to use Storm

1.4. Summary