7 Learning at Scale
This chapter covers
- Strategies for handling data overload in small systems
- Recognizing GNN problems that require scaled resources
- Seven robust techniques for mitigating issues arising from large data
- Using PyTorch Geometric (PyG) to scale GNNs and tackle scalability challenges
For most of our journey through GNNs, we have explained key architectures and methods, but have limited examples to problems of relatively small scale. Our reason for doing so was to allow the reader to access example code and data readily. We used computing notebooks hosted on free cloud computing resources such as Colab.
However, real world problems in deep learning are not often so neatly packaged. One of the multitude of issues one can encounter in real world situations is training GNN models with data that is too large. This large data eclipses our memory or processing capacity.
As we explore the challenges of scalability, it's crucial to have a clear mental model of the GNN training process. Figure 7.1 re-visits our familiar visualization of this process. At its core, the training of a GNN revolves around acquiring data from a source, processing this data to extract relevant node and edge features, and then using these features to train a model. As the data grows in size, each of these steps can become increasingly resource-intensive, making necessary the scalable strategies we'll explore in this chapter.