concept DataLoader in category deep learning

This is an excerpt from Manning's book Deep Learning with PyTorch.
As data storage is often slow, in particular due to access latency, we want to parallelize data loading. But as the many things Python is well loved for do not include easy, efficient, parallel processing, we will need multiple processes to load our data, in order to assemble them into batches: tensors that encompass several samples. This is rather elaborate; but as it is also relatively generic, PyTorch readily provides all that magic in the
DataLoader
class. Its instances can spawn child processes to load data from a dataset in the background so that it’s ready and waiting for the training loop as soon as the loop can use it. We will meet and useDataset
andDataLoader
in chapter 7.
In our training code, we chose minibatches of size 1 by picking one item at a time from the dataset. The
torch.utils.data
module has a class that helps with shuffling and organizing the data in minibatches:DataLoader
. The job of a data loader is to sample minibatches from a dataset, giving us the flexibility to choose from different sampling strategies. A very common strategy is uniform sampling after shuffling the data at each epoch. Figure 7.14 shows the data loader shuffling the indices it gets from theDataset
.Let’s see how this is done. At a minimum, the
DataLoader
constructor takes aDataset
object as input, along withbatch_size
and ashuffle
Boolean that indicates whether the data needs to be shuffled at the beginning of each epoch:train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)