8 Considerations for GNN Projects

This chapter covers

Creating a graph data model from non-graph data
ETL and preprocessing from raw data sources
Creating datasets and data loaders with Pytorch Geometric

In this chapter, we describe the practical aspects of working with graph data, as well as how to convert non-graph data into a graph format. We will explain some of the considerations involved in taking data from a raw state to a pre-processed format. This includes turning tabular or other non-graph data into graphs and preprocessing them for a graph-based ML package. In our mental model, shown in Figure 8.1, we are at the left half of the figure.

Figure 8.1 Mental model for graph training process. We are at the start of the process, where we prepare our data for training.

We’ll proceed as follows. In Section 8.1, we introduce an example problem that might require a GNN and how to proceed with tackling this project. Section 8.2 goes into more detail on how to use non-graph data in graph models. We then put these ideas in action in Section 8.3 by taking a dataset from a raw file to preprocessed data, ready for training. Finally, ideas for finding more graph datasets are given in Section 8.4.

8.1 A social network to introduce data preparation and project planning

8.1.1 Project definition

8.1.2 Project objectives and scope

8.2 Designing graph models

8.2.1 Get familiar with the domain and use case

8.2.2 Constructing the graph dataset and schemas

8.2.3 Creating instance models

8.2.4 Testing and refactoring

8.3 Data pipeline example

8.3.1 Raw data

8.3.2 The extract/transform/load step or ETL

8.3.3 Data exploration and visualization

8.3.4 Preprocessing and loading data into Pytorch Geometric

8.3.5 Where to find graph data

8.4 Summary

8.5 References and further reading