table-of-contents

Table of Contents

Brief Table of Contents

Table of Contents

Acknowledgments

About this book

About the author

About the cover illustration

Chapter 1. Introduction

1.1. What you’ll learn in this book

1.2. Why large datasets?

1.3. What is parallel computing?

1.3.1. Understanding parallel computing

1.3.2. Scalable computing with the map and reduce style

1.3.3. When to program in a map and reduce style

1.4. The map and reduce style

1.4.1. The map function for transforming data

1.4.2. The reduce function for advanced transformations

1.4.3. Map and reduce for data transformation pipelines

1.5. Distributed computing for speed and scale

1.6. Hadoop: A distributed framework for map and reduce

1.7. Spark for high-powered map, reduce, and more

1.8. AWS Elastic MapReduce—Large datasets in the cloud

Chapter 2. Accelerating large dataset work: Map and parallel computing

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }