8 Considerations for GNN Projects

 

This chapter covers

  • Creating a graph data model from non-graph data
  • ETL and preprocessing from raw data sources
  • Creating datasets and data loaders with Pytorch Geometric

In this chapter, we describe the practical aspects of working with graph data, as well as how to convert non-graph data into a graph format. We will explain some of the considerations involved in taking data from a raw state to a pre-processed format. This includes turning tabular or other non-graph data into graphs and preprocessing them for a graph-based ML package. In our mental model, shown in Figure 8.1, we are at the left half of the figure.

Figure 8.1 Mental model for graph training process. We are at the start of the process, where we prepare our data for training.

We’ll proceed as follows. In Section 8.1, we introduce an example problem that might require a GNN and how to proceed with tackling this project. Section 8.2 goes into more detail on how to use non-graph data in graph models. We then put these ideas in action in Section 8.3 by taking a dataset from a raw file to preprocessed data, ready for training. Finally, ideas for finding more graph datasets are given in Section 8.4.

8.1 A social network to introduce data preparation and project planning

 

8.1.1 Project definition

 
 
 

8.1.2 Project objectives and scope

 
 
 

8.2 Designing graph models

 
 

8.2.1 Get familiar with the domain and use case

 

8.2.2 Constructing the graph dataset and schemas

 
 
 

8.2.3 Creating instance models

 
 
 
 

8.2.4 Testing and refactoring

 

8.3 Data pipeline example

 
 
 

8.3.1 Raw data

 
 

8.3.2 The extract/transform/load step or ETL

 
 
 

8.3.3 Data exploration and visualization

 
 

8.3.4 Preprocessing and loading data into Pytorch Geometric

 
 
 

8.3.5 Where to find graph data

 
 
 

8.4 Summary

 
 

8.5 References and further reading

 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest