Chapter 5. Built-in algorithms
This chapter covers
- Algorithms that come with the GraphX API
- Detecting clusters within graphs: PageRank, Shortest Paths, Connected Components, Label Propagation
- Measuring connectedness of a graph or subgraph with Triangle Count
- Measuring the connectedness of a subset of users in a social network graph and finding isolated populations
In chapter 4 you learned about the foundational GraphX APIs that enable you to write your own custom algorithms. But there’s no need for you to reinvent the wheel in cases where the GraphX API already provides an implemented standard algorithm. In this chapter, we describe some of those basic algorithms and discuss which situations they can be used in:
- PageRank
- Personalized PageRank
- Triangle Count
- Shortest Paths
- Connected Components
- Strongly Connected Components
- Label Propagation
We wait until chapter 7 to cover SVDPlusPlus, one of the more useful and advanced built-in algorithms.
Chapter 2 covered an example invoking the PageRank algorithm, which was originally invented to rank web pages for a search engine, but there we used it to find influential papers in a citation network. Generally speaking, PageRank can be used to find the “important” nodes in almost any graph. Here we go more in depth into PageRank: what it does under the covers and parameters and different ways of invoking it. Note that PageRank is patented by Stanford University and trademarked by Google.