concept LibriSpeech in category machine learning

appears as: LibriSpeech, LibriSpeech
Machine Learning with TensorFlow, Second Edition MEAP V08

This is an excerpt from Manning's book Machine Learning with TensorFlow, Second Edition MEAP V08.

One set of open-source audio books is available from the Open Speech and Language Resources (OpenSLR) webpage and the LibriSpeech corpus. LibriSpeech is a set of short clips from audio books and corresponding transcripts to go with those clips. LibriSpeech includes more than 1000 hours of recorded 16KHz-English speech audio, including metadata, original mp3 files, and a separated and an aligned training set of 100, 360, and 500 hours of speech. The dataset includes transcriptions, along with a dev dataset for per-epoch validation and a test set for post training testing.

Unfortunately, the dataset isn’t directly usable in the deep speech model because the model expects Windows Audio Video interleaved (.wav) file audio format instead of the Free Lossless Audio Codec (.flac) file format that LibriSpeech comes in. So as usual, your first step for machine learning is going to involve—you guessed it—Time for some data preparation and cleaning!

Figure 17.1 The data cleaning and preparation process to transform the LibriSpeech OpenSLR data for the deep speech model.
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest