chapter seven

7 Learning at Scale

This chapter covers

Strategies for handling data overload in small systems
Recognizing GNN problems that require scaled resources
Seven robust techniques for mitigating issues arising from large data
Using PyTorch Geometric (PyG) to scale GNNs and tackle scalability challenges

For most of our journey through GNNs, we have explained key architectures and methods, but have limited examples to problems of relatively small scale. Our reason for doing so was to allow the reader to access example code and data readily. We used computing notebooks hosted on free cloud computing resources such as Colab.

However, real world problems in deep learning are not often so neatly packaged. One of the multitude of issues one can encounter in real world situations is training GNN models with data that is too large. This large data eclipses our memory or processing capacity.

As we explore the challenges of scalability, it's crucial to have a clear mental model of the GNN training process. Figure 7.1 re-visits our familiar visualization of this process. At its core, the training of a GNN revolves around acquiring data from a source, processing this data to extract relevant node and edge features, and then using these features to train a model. As the data grows in size, each of these steps can become increasingly resource-intensive, making necessary the scalable strategies we'll explore in this chapter.

7.1 Examples of the chapter

7.1.1 Amazon Products dataset

7.1.2 GeoGrid Inc

7.2 Framing problems of scale

7.2.1 Root causes

7.2.2 Symptoms

7.2.3 Crucial metrics

7.3 Techniques for tackling problems of scale

7.3.1 Seven techniques

7.3.2 General Steps

7.4 Choice of hardware configuration

7.4.1 Types of hardware choices

7.4.2 Choice of processor and memory size

7.5 Choice of data representation

7.6 Choice of GNN algorithm

7.6.1 Time and space complexity

7.7 Batching using a sampling method

7.7.1 Two concepts: mini-batching and sampling

7.7.2 A Glance at Notable PyG Samplers

7.8 Parallel & distributed processing

7.8.1 Using Distributed Data Parallel

7.8.2 Code example for Distributed Data Parallel

7.9 Training with remote storage

7.10 Graph coarsening

7.11 Summary

7.12 References and further reading