Chapter 7. Machine learning


This chapter covers

  • Machine learning using graphs
  • Supervised learning: movie recommender system, spam detection
  • Unsupervised learning: document clustering, image segmentation via clustering
  • Semi-supervised learning: graph generation from numeric vectors
  • Using Spark MLlib with GraphX

Machine learning is a subset of the broader field of artificial intelligence (AI) that deals with predicting data given some body of reference data, such as predicting whether you might like the film The Empire Strikes Back given that you liked the film Star Wars.

Even though it’s a subset of AI, machine learning is an enormous topic. There are dozens of different categories of machine learning algorithms and hundreds of standard machine learning algorithms, covering many different use cases and employing varying techniques. Many of the algorithms use matrices as their primary data structure, but some use graphs instead. The MLlib component of Spark focuses on the matrix-based machine learning algorithms, though it does make use of GraphX for a couple of its algorithms. The overlap between MLlib and GraphX goes in the other direction, too: GraphX includes one machine learning algorithm, SVDPlusPlus, for recommender systems.

7.1. Supervised, unsupervised, and semi-supervised learning

7.2. Recommend a movie: SVDPlusPlus

7.3. Using GraphX With MLlib

7.4. Poor man’s training data: graph-based semi-supervised learning

7.5. Summary