2 Pretrained networks

This chapter covers

Running pretrained image-recognition models
An introduction to GANs and CycleGAN
Captioning models that can produce text descriptions of images
Accessing models through PyTorch Hub and Hugging Face

We closed our first chapter promising to unveil amazing things in this chapter, and now it’s time to deliver. Computer vision is certainly one of the fields that have been most impacted by the advent of deep learning, for a variety of reasons. The need to classify or interpret the content of natural images existed, very large datasets became available, and new constructs such as convolutional layers were invented and could be run quickly on GPUs with unprecedented accuracy. All of these factors combined with the internet giants’ desire to understand pictures taken by millions of users with their mobile devices and managed on said giants’ platforms. Quite the perfect storm.

2.1 A pretrained network that recognizes the subject of an image

2.1.1 Obtaining a pretrained network for image recognition

2.1.2 AlexNet

2.1.3 ResNet

2.1.4 Ready, set, almost run

2.1.5 Run!

2.2 A pretrained model that fakes it until it makes it

2.2.1 The GAN game

2.2.2 CycleGAN

2.2.3 A network that turns horses into zebras

2.3 Model Zoos: PyTorch Hub and Hugging Face

2.4 A pretrained network that describes scenes

2.4.1 BLIP in action

2.5 Conclusion

2.6 Exercises

2.7 Summary