Table of Contents

 

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book

Author Online

About the author

About the cover illustration

1. Hadoop—A Distributed Programming Framework

Chapter 1. Introducing Hadoop

1.1. Why “Hadoop in Action”?

1.2. What is Hadoop?

1.3. Understanding distributed systems and Hadoop

1.4. Comparing SQL databases and Hadoop

Scale-Out Instead of Scale-Up

Key/Value Pairs Instead of Relational Tables

Functional Programming (Mapreduce) Instead of Declarative Queries (SQL)

Offline Batch Processing Instead of Online Transactions

1.5. Understanding MapReduce

1.5.1. Scaling a simple program manually

1.5.2. Scaling the same program in MapReduce

1.6. Counting words with Hadoop—running your first program

1.7. History of Hadoop

1.8. Summary

1.9. Resources

Chapter 2. Starting Hadoop

2.1. The building blocks of Hadoop

2.1.1. NameNode

2.1.2. DataNode

2.1.3. Secondary NameNode

2.1.4. JobTracker

2.1.5. TaskTracker

2.2. Setting up SSH for a Hadoop cluster

2.2.1. Define a common account

2.2.2. Verify SSH installation

2.2.3. Generate SSH key pair

2.2.4. Distribute public key and validate logins

2.3. Running Hadoop

2.3.1. Local (standalone) mode

2.3.2. Pseudo-distributed mode

2.3.3. Fully distributed mode

2.4. Web-based cluster UI

2.5. Summary

Chapter 3. Components of Hadoop

3.1. Working with files in HDFS

3.1.1. Basic file commands