Chapter 5. Indexing the data

 

This chapter covers

  • Creating and maintaining manual indexes
  • Using schema indexing and auto-indexing
  • Explore trade-offs when creating an indexing strategy

In the previous chapter you saw how easy it is to move between nodes in the Neo4j graph by traversing relationships. Moving between nodes allows you to quickly and easily find connected nodes, such as a person’s friends and the movies they like. Neo4j is optimized to make graph traversal fast, but reducing the number of nodes that needs to be traversed by knowing where to start is important, and it becomes increasingly so as the size of the data set increases.

To determine where to start in the graph, Neo4j uses indexing. An index in a relational database provides the ability to quickly and easily find rows in a table by the values of particular columns. Similarly, Neo4j indexing makes it easy to find nodes or relationships with particular property values. Unlike a relational database, Neo4j requires your application code to create and maintain index entries.

Because the application code takes on the responsibility for indexing, you need to give careful thought to your indexing strategy. Poor decisions about indexing can lead to poor performance or excessive disk use. In this chapter we’ll show you how to create, maintain, and use indexes with Neo4j. We’ll then explore why you should index and discuss the inevitable trade-offs you’ll need to make when creating an indexing strategy.

5.1. Creating the index entry

5.2. Finding the user by their email

5.3. Dealing with more than one match

5.4. Dealing with changes to indexed data

5.5. Automatic indexing

5.6. The cost/benefit trade-off of indexing

5.7. Summary