chapter eight

8 Considerations for GNN projects

This chapter covers

Creating a graph data model from nongraph data
Extract, transform, load and preprocessing from raw data sources
Creating datasets and data loaders with PyTorch Geometric

In this chapter, we describe the practical aspects of working with graph data, as well as how to convert nongraph data into a graph format. We’ll explain some of the considerations involved in taking data from a raw state to a preprocessed format. This includes turning tabular or other nongraph data into graphs and preprocessing them for a graph-based machine learning package. In our mental model, shown in figure 8.1, we are in the left half of the figure.

Figure 8.1 Mental model for graph training process. We’re at the start of the process, where we prepare our data for training.

We’ll proceed as follows. In section 8.1, we introduce an example problem that might require a graph neural network (GNN) and how to proceed with tackling this project. Section 8.2 goes into more detail on how to use nongraph data in graph models. We then put these ideas into action in section 8.3 by taking a dataset from a raw file to preprocessed data, ready for training. Finally, ideas for finding more graph datasets are given in section 8.4.

8.1 Data preparation and project planning

8.1.1 Project definition

8.1.2 Project objectives and scope

8.2 Designing graph models

8 Considerations for GNN projects

This chapter covers

Figure 8.1 Mental model for graph training process. We’re at the start of the process, where we prepare our data for training.

8.1 Data preparation and project planning

8.1.1 Project definition

8.1.2 Project objectives and scope

8.2 Designing graph models

8.2.1 Get familiar with the domain and use case

8.2.2 Constructing the graph dataset and schemas

8.2.3 Creating instance models

8.2.4 Testing and refactoring

8.3 Data pipeline example

8.3.1 Raw data

8.3.2 The ETL step

8.3.3 Data exploration and visualization

8.3.4 Preprocessing and loading data into PyG

8.4 Where to find graph data

Summary