Chapter 1. Introducing Storm


This chapter covers

  • What Storm is
  • The definition of big data
  • Big data tools
  • How Storm fits into the big data picture
  • Reasons for using Storm

Apache Storm is a distributed, real-time computational framework that makes processing unbounded streams of data easy. Storm can be integrated with your existing queuing and persistence technologies, consuming streams of data and processing/transforming these streams in many ways.

Still following us? Some of you are probably feeling smart because you know what that means. Others are searching for the proper animated GIF to express your level of frustration. There’s a lot in that description, so if you don’t grasp what all of it means right now, don’t worry. We’ve devoted the remainder of this chapter to clarifying exactly what we mean.

To appreciate what Storm is and when it should be used, you need to understand where Storm falls within the big data landscape. What technologies can it be used with? What technologies can it replace? Being able to answer questions like these requires some context.

1.1. What is big data?

To talk about big data and where Storm fits within the big data landscape, we need to have a shared understanding of what “big data” means. There are a lot of definitions of big data floating around. Each has its own unique take. Here’s ours.

1.1.1. The four Vs of big data

Big data is best understood by considering four different properties: volume, velocity, variety, and veracity.[1]

1.2. How Storm fits into the big data picture

1.3. Why you’d want to use Storm

1.4. Summary