12 Additional training for voice assistants

 

This chapter covers

  • Collecting data to test and train a speech-to-text model
  • Evaluating the impact of a speech-to-text model on the AI assistant’s success metric
  • Identifying the most appropriate speech training option for a conversational AI assistant
  • Training custom speech recognition models for open-ended and constrained inputs

Fictitious Inc. created its first conversational AI assistant through a text chat channel. That assistant has been so successful that Fictitious Inc. is now planning to expand to a telephone-based assistant as well. This assistant will cover the same customer care intents as the original assistant, but will take voice input rather than text input.

Speech-to-text models transcribe audio into text. In a voice assistant, the user speaks, and the transcription of that audio is passed to the assistant for intent detection and dialogue routing. In previous chapters, we have seen how assistants use text: data must be collected, and the assistant needs to be trained and tested.

12.1 Collecting data to test a speech-to-text model

12.1.1 Call recordings as speech training data

12.1.2 Generating synthetic speech data

12.2 Testing the speech-to-text model

12.2.1 Word error rate

12.2.2 Intent error rate

12.2.3 Sentence error rate

12.3 Training a speech-to-text model

12.3.1 Custom training with a language model

12.3.2 Custom training with an acoustic model

12.3.3 Custom training with grammars

Summary