1 Introduction to Graphs

 

This chapter covers

  • An introduction to graph terminology
  • Using graph databases to solve problems of highly connected data
  • The advantages of graph databases over relational databases
  • How to identify if your problem is a graph problem

In May of 2016, a massive leak of over 11 million documents, measuring ~2.6 terabytes of data, was published by the International Consortium for Investigative Journalists (ICIJ)[1] in what has become known as the Panama Papers.  This publication was a coordinated effort among journalists in nearly 80 countries to examine and connect information on approximately 200,000+ secret offshore companies based in Panama[2]. Their investigation led to the naming of many celebrities, politicians, and their families as potentially using offshore bank accounts to hide their fortunes.  Due to the sheer volume of records and the way the data was interconnected, the ICIJ decided to use a graph database named Neo4j to handle and coordinate the distributed efforts to sort out relationships between the various people, organizations, possessions and accounts.  Why would you choose to use a graph database over a more standard tool, such as a relational database, to answer questions about who is connected to whom? Let’s let Emil Eifrem, CEO & Co-founder of Neo4j, Inc., answer this. About the Panama Papers, he said

1.1       What is a graph?

1.1.1   What is a graph database?

1.1.2   Why Can't I Use SQL?

1.2       Is my problem a graph problem?

1.2.1   Explore the questions

1.2.2   I’m still confused… Is this a graph problem?

1.3       Technologies used in this book

1.4     Summary

sitemap