17 LSTMs and automatic speech recognition
This chapter covers
- Preparing a train, test, and evaluation dataset for automatic speech recognition using the LibriSpeech corpus
- Training and building a long short-term memory (LSTM) recurrent neural network (RNN) for converting speech to text
- Evaluating the LSTM performance during and after training
Speaking and talking to your electronic devices is mostly commonplace nowadays and it wasn’t always like that. Years ago on an early version of my smart phone, I recall clicking the microphone button and using its dictation function, trying to speak an email into existence. Let’s just say the email that my boss received at work had a whole bunch of typos, phonetic errors, and he was wondering if I was mixing a little too much of after-work activities with my official duties!
The world has evolved and so has the ability of neural networks to refine their accuracy and ability to perform automatic speech recognition (ASR), which is the process of transforming spoken audio into written text. If you think about it, whether you are using your phone’s intelligent digital assistant to ask it to schedule a meeting for you, or dictating that trusty email, or asking your smart device at home to order something off the web, or moreover playing background music—heck, even starting your car—it’s all powered by ASR functionality!