In chapters 5 and 6, you learned about using PyTorch on a small scale, instantiating tensors consisting of a few hundred data values and training machine learning models with just a few parameters. The scale used in chapter 6 meant that to train a machine learning model, you could perform gradient descent with an assumption that the entire set of model parameters, along with the parameter gradients and the entire training data set, could easily fit in memory of a single node and thus be readily available to the gradient descent algorithm.