concept worker node in category storm

appears as: worker node, A worker node, worker nodes, worker nodes, The worker nodes, worker node
Storm Applied: Strategies for real-time event processing

This is an excerpt from Manning's book Storm Applied: Strategies for real-time event processing.

So far, we know that our spouts and bolts are each running as one or more instances. Each of these instances is running somewhere, right? There has to be some machine (physical or virtual) that’s actually executing our components. We’ll call this machine a worker node, and though a worker node isn’t the only type of node running on a Storm cluster, it is the node that executes the logic in our spouts and bolts. And because Storm runs on the JVM, each of these worker nodes is executing our spouts and bolts on a JVM. Figure 3.12 shows what we have so far.

Figure 3.12. A worker node is a physical or virtual machine that’s running a JVM, which executes the logic in the spouts and bolts.

There’s a little more to a worker node, but what’s important for now is that you understand that it runs the JVM that executes our spout and bolt instances. So we pose the question again: what are executors and tasks? Executors are a thread of execution on the JVM, and tasks are the instances of our spouts and bolts running within a thread of execution. Figure 3.13 illustrates this relationship.

Figure 3.13. Executors (threads) and tasks (instances of spouts/bolts) run on a JVM.

It’s really that simple. An executor is a thread of execution within a JVM. A task is an instance of a spout or bolt running within that thread of execution. When discussing scalability in this chapter, we’re referring to changing the number of executors and tasks. Storm provides additional ways to scale by changing the number of worker nodes and JVMs, but we’re saving those for chapters 6 and 7.

Figure 7.1. The various types of nodes in a Storm cluster and worker nodes broken down as worker processes and their parts

With the terminology defined, let’s get started with the first of our “common solution” recipes in our cookbook approach, changing the number of worker processes (JVMs) running on a worker node. Addressing these “common solution” recipes now will provide a nice reference for later and allow us to focus on why each is a good solution for a particular scenario.

The number of worker processes running on a worker node is defined by the supervisor.slots.ports property in each worker node’s storm.yaml configuration file. This property defines the ports that each worker process will use to listen for messages. The next listing shows the default settings for this property.

Listing 7.1. Default settings for supervisor.slots.ports

To increase the number of worker processes that can be run on a worker node, add a port to this list for each worker process to be added. The opposite holds true for decreasing the number of worker processes: remove a port for each worker process to be removed.

Figure 7.12. A worker node has a fixed amount of memory that’s being used by its worker processes along with any other processes running on that worker node.

If a worker node is experiencing memory contention, that worker node will be swapping. Swapping is the little death and needs to be avoided if you care about latency and throughput. This is a problem when using Storm; each worker node needs to have enough memory so that the worker processes and OS don’t swap. If you want to maintain consistent performance, you must avoid swapping with Storm’s JVMs.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest