concept Hadoop cluster in category data

This is an excerpt from Manning's book Hadoop in Action.
Figure 1.1 illustrates how one interacts with a Hadoop cluster. As you can see, a Hadoop cluster is a set of commodity machines networked together in one location.[2] Data storage and processing all occur within this “cloud” of machines. Different users can submit computing “jobs” to Hadoop from individual clients, which can be their own desktop machines in remote locations from the Hadoop cluster.
2 While not strictly necessary, machines in a Hadoop cluster are usually relatively homogeneous x86 Linux boxes. And they’re almost always located in the same data center, often in the same set of racks.
Depending on your data processing needs, your Hadoop workload can vary widely over time. You may have a few large data processing jobs that occasionally take advantage of hundreds of nodes, but those same nodes will sit idle the rest of the time. You may be new to Hadoop and want to get familiar with it first before investing in a dedicated cluster. You may own a startup that needs to conserve cash and wants to avoid the capital expense of a Hadoop cluster. In these and other situations, it makes more sense to rent a cluster of machines rather than buy it.
As the Hadoop EC2 cluster is being rented, data stored in the cluster (including in HDFS) is not persistent. Your input data has to persist somewhere else and be brought into the EC2 cluster for processing. Many options exist for where to put your data and bring it into the Hadoop cluster, and each option has its trade-offs.