12 Generative Image Models
This chapter covers
- Understanding the intuition of generative image models
- Cleaning and preparing image data for training
- Grasping and implementing the steps to train a diffusion model
- Implementing major components for a diffusion model and U-Net
What are generative image models?
Generative image models are a family of algorithms and artificial neural network structures that are specialized towards generating accurate images based on human language input.
Imagine a sculptor who has spent his life observing people who are in deep thought. For years, he walks around the town and thoroughly studies every aspect of every person that he sees in thinking deeply - carefully studying their posture, expressions, and subtle details. Over time, he internalizes what it means to look like someone thinking. He might have seen hundreds of thousands of people pass through the town over the years.
We blindfold the sculptor and give him a random block of marble, and ask “Make me a sculpture of a person thinking”. The sculptor can’t add new material to marble; instead, he feels the block of marble and chips away a small piece that he’s confident does not look like a person thinking. He repeats this thousands of times, each time feeling the edges of the marble, and chipping away a little more “noise”. Slowly and methodically, a coherent image of a person thinking emerges from within the random block of stone (figure 12.1).