8 Generating with voice and pictures

This chapter covers

Transcribing audio to text
Generating audio from text
Images as prompt context
Generating images

Throughout history, we humans have developed several different ways of communicating with each other. Perhaps the oldest form of human communication is voice-based, where people speak and listen to each other. Text-based communication has taken many forms, from early hieroglyphs and the origin of the alphabet by the Phoenicians to letters, emails, and SMS text messages. And sometimes an image can, indeed, paint a thousand words, meaning that works of art and photographs make for a powerful form of communication that text and voice cannot compete with.

Thus far, our project has focused on text-based interaction with the Board Game Buddy application. The questions asked about games are sent in as text and the answers received are just more text. Since it will be humans who will ultimately be interacting with Board Game Buddy, it makes sense to offer more human-style communication with the application.

In this chapter, we’re going to leverage Spring AI to break away from text-based interaction, enabling speech-based and image-based communication in our application, both as input and output. Let’s start by seeing how Spring AI can enable us to add voice to an application.

8 Generating with voice and pictures

This chapter covers

8.1 Working with voice

8.1.1 Transcribing speech

8.1.2 Generating speech from text

8.2 Asking questions about images

8.3 Generating images

8.3.1 Specifying image options

8.4 Summary

8 Generating with voice and pictures

This chapter covers

8.1 Working with voice

8.1.1 Transcribing speech

8.1.2 Generating speech from text

8.2 Asking questions about images

8.3 Generating images

8.3.1 Specifying image options

8.4 Summary

Unable to load book!