8 Considerations for GNN Projects
This chapter covers
- Creating a graph data model from non-graph data
- ETL and preprocessing from raw data sources
- Creating datasets and data loaders with Pytorch Geometric
In this chapter, we describe the practical aspects of working with graph data, as well as how to convert non-graph data into a graph format. We will explain some of the considerations involved in taking data from a raw state to a pre-processed format. This includes turning tabular or other non-graph data into graphs and preprocessing them for a graph-based ML package. In our mental model, shown in Figure 8.1, we are at the left half of the figure.
Figure 8.1 Mental model for graph training process. We are at the start of the process, where we prepare our data for training.

We’ll proceed as follows. In Section 8.1, we introduce an example problem that might require a GNN and how to proceed with tackling this project. Section 8.2 goes into more detail on how to use non-graph data in graph models. We then put these ideas in action in Section 8.3 by taking a dataset from a raw file to preprocessed data, ready for training. Finally, ideas for finding more graph datasets are given in Section 8.4.