10 Creating robust coverage for speech-to-text resolution
This chapter covers
- In-depth understanding of speech-to-text components and how ASR works
- How to create robust grammars that avoid the biggest pitfalls
- How modifying grammar coverage helps users succeed
Today’s speech recognition (ASR) engines are amazing and better than ever, but they’re still not perfect. If you use dictation on your mobile phone you’ve seen the mistakes that can happen. The process of correctly turning someone’s spoken utterance into text is very hard.
Adding dialog makes it even harder. Anything that goes wrong early in the recognition process is amplified in each later step with odd results that we human listeners avoid thanks to our robust cognitive processing that’s not available to today’s computer implementations—maybe someday. If sounds aren’t recognized, users’ words are misinterpreted, meaning the speech-to-text (STT) is wrong. If STT produces incorrect or unexpected words and phrases, NL processing and intent evaluation won’t have what it needs to succeed.