8 Considerations for GNN projects

 

This chapter covers

  • Creating a graph data model from nongraph data
  • Extract, transform, load and preprocessing from raw data sources
  • Creating datasets and data loaders with PyTorch Geometric

In this chapter, we describe the practical aspects of working with graph data, as well as how to convert nongraph data into a graph format. We’ll explain some of the considerations involved in taking data from a raw state to a preprocessed format. This includes turning tabular or other nongraph data into graphs and preprocessing them for a graph-based machine learning package. In our mental model, shown in figure 8.1, we are in the left half of the figure.

Figure 8.1 Mental model for graph training process. We’re at the start of the process, where we prepare our data for training.
figure

We’ll proceed as follows. In section 8.1, we introduce an example problem that might require a graph neural network (GNN) and how to proceed with tackling this project. Section 8.2 goes into more detail on how to use nongraph data in graph models. We then put these ideas into action in section 8.3 by taking a dataset from a raw file to preprocessed data, ready for training. Finally, ideas for finding more graph datasets are given in section 8.4.

8.1 Data preparation and project planning

8.1.1 Project definition

8.1.2 Project objectives and scope

8.2 Designing graph models

8.2.1 Get familiar with the domain and use case

8.2.2 Constructing the graph dataset and schemas

8.2.3 Creating instance models

8.2.4 Testing and refactoring

8.3 Data pipeline example

8.3.1 Raw data

8.3.2 The ETL step

8.3.3 Data exploration and visualization

8.3.4 Preprocessing and loading data into PyG

8.4 Where to find graph data

Summary