Chapter 8. The missing algorithms

 

This chapter covers

  • Reading RDF files
  • Merging graphs
  • Filtering out isolated vertices
  • Using IndexedRDD for performance gains
  • Taking a simplistic approach to finding graph isomorphisms
  • Computing the global clustering coefficient

You’ve seen examples of reading graph data from edge list files in earlier chapters. RDF is another important file format used for many existing file formats. This chapter shows you how to read in this file format and use this knowledge to make use of the YAGO3 dataset.

Aside from the classic graph algorithms from chapter 6, there are other slightly more modern algorithms that one comes to expect in a graph database or graph processing system. Some of these are missing—not implemented yet (or at least not commonly available in either the official Apache Spark distribution as of Spark 1.6 or even from spark-packages.org).

In this chapter, you’ll see how to implement some of these algorithms. You’ll also see how to use IndexedRDD for performance gains. IndexedRDD was originally written by one of the main GraphX code contributors but never merged into the Apache Spark distribution.

8.1. Missing basic graph operations

 
 

8.2. Reading RDF graph files

 
 
 

8.3. Poor man’s graph isomorphism: finding missing Wikipedia infobox items

 
 
 
 

8.4. Global clustering coefficient: compare connectedness

 
 

8.5. Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest