Chapter 8. The missing algorithms
This chapter covers
- Reading RDF files
- Merging graphs
- Filtering out isolated vertices
- Using IndexedRDD for performance gains
- Taking a simplistic approach to finding graph isomorphisms
- Computing the global clustering coefficient
You’ve seen examples of reading graph data from edge list files in earlier chapters. RDF is another important file format used for many existing file formats. This chapter shows you how to read in this file format and use this knowledge to make use of the YAGO3 dataset.
Aside from the classic graph algorithms from chapter 6, there are other slightly more modern algorithms that one comes to expect in a graph database or graph processing system. Some of these are missing—not implemented yet (or at least not commonly available in either the official Apache Spark distribution as of Spark 1.6 or even from spark-packages.org).
In this chapter, you’ll see how to implement some of these algorithms. You’ll also see how to use IndexedRDD for performance gains. IndexedRDD was originally written by one of the main GraphX code contributors but never merged into the Apache Spark distribution.