chapter four

4 From pixels to pictures: Generating images

This chapter covers

Generative AI vision models, their model architecture, and key use cases for enterprises
Using Stable Diffusion’s GUIs and APIs for image generation and editing
Using advanced editing techniques, such as inpainting, outpainting, and image variations
Practical image generation tips for enterprises to consider

Generating images represents one of the many uses of generative AI, resulting in unique and realistic content from a mere prompt. Enterprises have been increasingly adopting generative AI to develop innovative image generation and editing solutions, which has led to many innovative use cases—from AI-powered architecture for innovative designs of buildings to fashion design, avatar generation, virtual clothes try-on, and virtual patients for medical training, to name a few. They are accompanied by exciting products such as Microsoft Designer and Adobe Firefly, and they will be covered in this chapter.

In the previous chapters, we talked about the fundamentals of generative AI and the technology that enables us to generate text, including completions and chats. However, in this chapter, we shift gears and explore how generative AI can be utilized to produce and adjust images. We will see how creating images is a simple process and highlight some of the complexities of getting them right.

4.1 Vision models

4.1.1 Variational autoencoders

4.1.2 Generative adversarial networks

4.1.3 Vision transformer models

4.1.4 Diffusion models

4.1.5 Multimodal models

4.2 Image generation with Stable Diffusion

4.2.1 Dependencies

4.2.2 Generating an image

4.3 Image generation with other providers

4.3.1 OpenAI DALLE 3

4.3.2 Bing image creator

4.3.3 Adobe Firefly

4.4 Editing and enhancing images using Stable Diffusion

4.4.1 Generating using image-to-image API

4.4.2 Using the masking API

4.4.3 Resize using the upscale API

4.4.4 Image generation tips

Summary