chapter fifteen

Chapter 15. Queuing and stream processing: Illustration

This chapter covers

Using Apache Storm
Guaranteeing message processing
Integrating Apache Kafka, Apache Storm, and Apache Cassandra
Implementing the SuperWebAnalytics.com uniques-over-time speed layer

In the last chapter you learned about multi-consumer queues and the Storm model as a general approach to one-at-a-time stream processing. Let’s now look at how you can apply these ideas in practice using the real-world tools Apache Storm and Apache Kafka. We’ll conclude the chapter by implementing the speed layer for unique pageviews for SuperWebAnalytics.com.

15.1. Defining topologies with Apache Storm

Apache Storm is an open source project that implements (and originates) the Storm model. You’ve seen that the core concepts in the Storm model are tuples, streams, spouts, bolts, and topologies. Let’s now implement streaming word count using the Apache Storm API. For reference, the word-count topology is repeated in figure 15.1.

Figure 15.1. Word-count topology

To begin, you first instantiate a TopologyBuilder object to define the application topology. The TopologyBuilder object exposes the API for specifying Storm topologies:

TopologyBuilder builder = new TopologyBuilder();

Next, you add a spout that emits a stream of sentences. This spout is named sentence-spout and is given a parallelism of 8, meaning 8 threads will be spawned across the cluster to execute the spout:

Chapter 15. Queuing and stream processing: Illustration

This chapter covers

15.1. Defining topologies with Apache Storm

Figure 15.1. Word-count topology

15.2. Apache Storm clusters and deployment

15.3. Guaranteeing message processing

15.4. Implementing the SuperWebAnalytics.com uniques-over-time speed layer

15.5. Summary