chapter five

5 Introducing PyTorch: Tensor Basics

This chapter covers:

Introducing PyTorch and PyTorch tensors
Using PyTorch tensor creation methods
Understanding tensor operations and broadcasting
Exploring PyTorch tensor performance on CPUs

In the previous chapter, you started with a cleaned up version of the DC taxi dataset and applied a data-driven sampling procedure in order to identify the right fraction of the dataset to allocate to a held-out, test data subset. You also used a Jupyter notebook to analyze the results of the sampling experiments and then launched a PySpark job to generate three separate subsets of data: training, validation, and test.

This chapter takes you on a temporary detour away from the DC taxi dataset to prepare you to write scalable machine learning code using PyTorch. Don’t worry, chapter 7 returns to the DC taxi dataset to benchmark a baseline PyTorch machine learning model. In this chapter, you are going to focus on learning about PyTorch, one of the top frameworks for deep learning and many other types of machine learning algorithms. Having used TensorFlow 2.0, Keras, and PyTorch for machine learning projects that required distributed training on a machine learning platform, I found PyTorch to be the best one. PyTorch scales from mission critical, production machine learning use cases at Tesla ^[1] to state-of-the-art research at OpenAI. ^[2].

5 Introducing PyTorch: Tensor Basics

This chapter covers:

5.1 Getting started with tensors

5.2 Getting started with PyTorch tensor creation operations

5.3 Creating PyTorch tensors of pseudo-random and interval values

5.4 PyTorch tensor operations and broadcasting

5.5 PyTorch tensors vs. native Python lists

5.6 Summary