Chapter 9. Connecting the dots with GraphX
This chapter covers
- Using the GraphX API
- Transforming and joining graphs
- Using GraphX algorithms
- Implementing the A* search algorithm with the GraphX API
This chapter completes your tour of Spark components with an overview of GraphX, Spark’s graph-processing API. In this chapter, we’ll show you how to use GraphX and give you examples of using graph algorithms in Spark. These include shortest paths, page rank, connected components, and strongly connected components. If you’re interested in learning about other algorithms available in Spark (triangle count, LDA, and SVD++), or more about GraphX in general, Michael Malak and Robin East go into much more detail in their book GraphX in Action (Manning, 2016), which we highly recommend.
A graph, as a mathematical concept of linked objects, consists of vertices (objects in the graph) and edges that connect the vertices (or links between the objects). In Spark, edges are directed (they have a source and a destination vertex), and both edges and vertices have property objects attached to them. For example, in a graph containing data about pages and links, property objects attached to the vertices may contain information about a page’s URL, title, date, and so on, and property objects attached to the edges may contain a description of the link (contents of an <a> HTML tag).