Chapter 1. Two important technologies: Spark and graphs

 

This chapter covers

  • Why Spark has become the leading Big Data processing system
  • What makes graphs a unique way of modeling connected data
  • How GraphX makes Spark a leading platform for graph analytics

It’s well-known that we are generating more data than ever. But it’s not just the individual data points that are important—it’s also the connections between them. Extracting information from such connected datasets can give insights into numerous areas such as detecting fraud, collecting bioinformatics, and ranking pages on the web.

Graphs provide a powerful way to represent and exploit these connections. Graphs represent networks of data points as vertices and encode connections through edges between pairs of vertices. Graphs can be used to model such diverse areas as computer vision, natural language processing, and recommender systems.

With such a representation of connected data comes a whole raft of tools and techniques that can be used to mine the information content of the network. Among the many tools covered in this book, you’ll find PageRank (for finding the most influential members of the network), topic modeling with Latent Dirichlet Allocation (LDA), and clustering coefficient to discover highly connected communities.

1.1. Spark: the step beyond Hadoop MapReduce

 
 

1.2. Graphs: finding meaning from relationships

 

1.3. Putting them together for lightning fast graph processing: Spark GraphX

 
 

1.4. Summary

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage