This chapter introduces various techniques for large-scale training—for example, using large datasets to train large models on multiple GPUs. For datasets that are too big to fit into memory all at once, we’ll show you how to load them batch by batch during the training. We’ll also introduce different parallelization strategies to distribute the training and search processes onto multiple GPUs. In addition, we’ll show you some strategies to accelerate the search process with limited computing resources, using advanced search algorithms and search spaces.