appendix-a

Appendix A. Introduction to PyTorch

This chapter covers

An overview of the PyTorch deep learning library
Setting up an environment and workspace for deep learning
Tensors as a fundamental data structure for deep learning
The mechanics of training deep neural networks
Training models on GPUs

This chapter is designed to equip you with the necessary skills and knowledge to put deep learning into practice and implement large language models (LLMs) from scratch.

We will introduce PyTorch, a popular Python-based deep learning library, which will be our primary tool for the remainder of this book. This chapter will also guide you through setting up a deep learning workspace armed with PyTorch and GPU support.

Then, you'll learn about the essential concept of tensors and their usage in PyTorch. We will also delve into PyTorch's automatic differentiation engine, a feature that enables us to conveniently and efficiently use backpropagation, which is a crucial aspect of neural network training.

Note that this chapter is meant as a primer for those who are new to deep learning in PyTorch. While this chapter explains PyTorch from the ground up, it's not meant to be an exhaustive coverage of the PyTorch library. Instead, this chapter focuses on the PyTorch fundamentals that we will use to implement LLMs throughout this book. If you are already familiar with deep learning, you may skip this appendix and directly move on to chapter 2, working with text data.

A.1 What is PyTorch

A.1.1 The three core components of PyTorch

A.1.2 Defining deep learning

A.1.3 Installing PyTorch

A.2 Understanding tensors

A.2.1 Scalars, vectors, matrices, and tensors

A.2.2 Tensor data types

Appendix A. Introduction to PyTorch

This chapter covers

A.1 What is PyTorch

A.1.1 The three core components of PyTorch

A.1.2 Defining deep learning

A.1.3 Installing PyTorch

A.2 Understanding tensors

A.2.1 Scalars, vectors, matrices, and tensors

A.2.2 Tensor data types

A.2.3 Common PyTorch tensor operations

A.3 Seeing models as computation graphs

A.4 Automatic differentiation made easy

A.5 Implementing multilayer neural networks

A.6 Setting up efficient data loaders

A.7 A typical training loop

A.8 Saving and loading models

A.9 Optimizing training performance with GPUs

A.9.1 PyTorch computations on GPU devices

A.9.2 Single-GPU training

A.9.3 Training with multiple GPUs

A.10 Summary

A.11 Further reading

A.12 Exercise answers