concept Hadoop cluster in category data

appears as: Hadoop cluster, A Hadoop cluster, Hadoop cluster, Hadoop clusters
Hadoop in Action

This is an excerpt from Manning's book Hadoop in Action.

Figure 1.1. A Hadoop cluster has many parallel machines that store and process large data sets. Client computers send jobs into this computer cloud and obtain results.

Figure 1.1 illustrates how one interacts with a Hadoop cluster. As you can see, a Hadoop cluster is a set of commodity machines networked together in one location.[2] Data storage and processing all occur within this “cloud” of machines. Different users can submit computing “jobs” to Hadoop from individual clients, which can be their own desktop machines in remote locations from the Hadoop cluster.

2 While not strictly necessary, machines in a Hadoop cluster are usually relatively homogeneous x86 Linux boxes. And they’re almost always located in the same data center, often in the same set of racks.

Figure 1.1. A Hadoop cluster has many parallel machines that store and process large data sets. Client computers send jobs into this computer cloud and obtain results.

Depending on your data processing needs, your Hadoop workload can vary widely over time. You may have a few large data processing jobs that occasionally take advantage of hundreds of nodes, but those same nodes will sit idle the rest of the time. You may be new to Hadoop and want to get familiar with it first before investing in a dedicated cluster. You may own a startup that needs to conserve cash and wants to avoid the capital expense of a Hadoop cluster. In these and other situations, it makes more sense to rent a cluster of machines rather than buy it.

9.4.2. Accessing your data from the Hadoop cluster

As the Hadoop EC2 cluster is being rented, data stored in the cluster (including in HDFS) is not persistent. Your input data has to persist somewhere else and be brought into the EC2 cluster for processing. Many options exist for where to put your data and bring it into the Hadoop cluster, and each option has its trade-offs.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest