Chapter 2. Introduction to YARN

 

This chapter covers

  • Understanding how YARN works
  • How MapReduce works as a YARN application
  • A look at other YARN applications

Imagine buying your first car, which upon delivery has a steering wheel that doesn’t function and brakes that don’t work. Oh, and it only drives in first gear. No speeding on winding back roads for you! That empty, sad feeling is familiar to those of us who want to run some cool new tech such as graph or real-time data processing with Hadoop 1,[1] only to be reminded that our powerful Hadoop clusters were good for one thing, and one thing only: MapReduce.

1 While you can do graph processing in Hadoop 1, it’s not a native fit, which means you’re either incurring the inefficiencies of multiple disk barriers between each iteration on your graph, or hacking around in MapReduce to avoid such barriers.

Luckily for us the Hadoop committers took these and other constraints to heart and dreamt up a vision that would metamorphose Hadoop above and beyond MapReduce. YARN is the realization of this dream, and it’s an exciting new development that transitions Hadoop into a distributed computing kernel that can support any type of workload.[2] This opens up the types of applications that can be run on Hadoop to efficiently support computing models for machine learning, graph processing, and other generalized computing projects (such as Tez), which are discussed later in this chapter

2.1. YARN overview

2.2. YARN and MapReduce

2.3. YARN applications

2.4. Chapter summary