This chapter covers
- Collecting data to test and train a speech-to-text model
- Evaluating the impact of a speech-to-text model on the AI assistant’s success metric
- Identifying the most appropriate speech training option for a conversational AI assistant
- Training custom speech recognition models for open-ended and constrained inputs
Fictitious Inc. created its first conversational AI assistant through a text chat channel. That assistant has been so successful that Fictitious Inc. is now planning to expand to a telephone-based assistant as well. This assistant will cover the same customer care intents as the original assistant, but will take voice input rather than text input.
Speech-to-text models transcribe audio into text. In a voice assistant, the user speaks, and the transcription of that audio is passed to the assistant for intent detection and dialogue routing. In previous chapters, we have seen how assistants use text: data must be collected, and the assistant needs to be trained and tested.