chapter five

5 Introducing PyTorch: Tensor basics

This chapter covers

Introducing PyTorch and PyTorch tensors
Using PyTorch tensor creation methods
Understanding tensor operations and broadcasting
Exploring PyTorch tensor performance on CPUs

In the previous chapter, you started with a cleaned-up version of the DC taxi data set and applied a data-driven sampling procedure in order to identify the right fraction of the data set to allocate to a held-out, test data subset. You also analyzed the results of the sampling experiments and then launched a PySpark job to generate three separate subsets of data: training, validation, and test.

This chapter takes you on a temporary detour from the DC taxi data set to prepare you to write scalable machine learning code using PyTorch. Don’t worry; chapter 7 returns to the DC taxi data set to benchmark a baseline PyTorch machine learning model. In this chapter, you will focus on learning about PyTorch, one of the top frameworks for deep learning and many other types of machine learning algorithms. I have used TensorFlow 2.0, Keras, and PyTorch for machine learning projects that required distributed training on a machine learning platform and found PyTorch to be the best one. PyTorch scales from mission-critical, production machine learning use cases at Tesla¹ to state-of-the-art research at OpenAI.²

5 Introducing PyTorch: Tensor basics

This chapter covers

5.1 Getting started with tensors

5.2 Getting started with PyTorch tensor creation operations

5.3 Creating PyTorch tensors of pseudorandom and interval values

5.4 PyTorch tensor operations and broadcasting

5.5 PyTorch tensors vs. native Python lists

Summary