Copyright
Brief Table of Contents
Table of Contents
Preface
Acknowledgments
About this Book
Author Online
About the author
About the cover illustration
1. Hadoop—A Distributed Programming Framework
Chapter 1. Introducing Hadoop
1.1. Why “Hadoop in Action”?
1.2. What is Hadoop?
1.3. Understanding distributed systems and Hadoop
1.4. Comparing SQL databases and Hadoop
Scale-Out Instead of Scale-Up
Key/Value Pairs Instead of Relational Tables
Functional Programming (Mapreduce) Instead of Declarative Queries (SQL)
Offline Batch Processing Instead of Online Transactions
1.5. Understanding MapReduce
1.5.1. Scaling a simple program manually
1.5.2. Scaling the same program in MapReduce
1.6. Counting words with Hadoop—running your first program
1.7. History of Hadoop
1.8. Summary
1.9. Resources
Chapter 2. Starting Hadoop
2.1. The building blocks of Hadoop
2.1.1. NameNode
2.1.2. DataNode
2.1.3. Secondary NameNode
2.1.4. JobTracker
2.1.5. TaskTracker
2.2. Setting up SSH for a Hadoop cluster
2.2.1. Define a common account
2.2.2. Verify SSH installation
2.2.3. Generate SSH key pair
2.2.4. Distribute public key and validate logins
2.3. Running Hadoop
2.3.1. Local (standalone) mode
2.3.2. Pseudo-distributed mode
2.3.3. Fully distributed mode
2.4. Web-based cluster UI
2.5. Summary
Chapter 3. Components of Hadoop
3.1. Working with files in HDFS
3.1.1. Basic file commands