2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178964
|View full text |Cite
|
Sign up to set email alerts
|

Librispeech: An ASR corpus based on public domain audio books

Abstract: This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
2,452
1
6

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 4,186 publications
(2,723 citation statements)
references
References 15 publications
6
2,452
1
6
Order By: Relevance
“…We used the Kaldi toolkit (Povey et al, 2011) and publicly available acoustic models trained on the LibriSpeech corpus (Panayotov et al, 2015). The forced alignment was spot-checked manually for accuracy and found to be very accurate.…”
Section: Calculating Reading Ratementioning
confidence: 99%
“…We used the Kaldi toolkit (Povey et al, 2011) and publicly available acoustic models trained on the LibriSpeech corpus (Panayotov et al, 2015). The forced alignment was spot-checked manually for accuracy and found to be very accurate.…”
Section: Calculating Reading Ratementioning
confidence: 99%
“…In order to verify our design we have used a set of the models, generated by the standard Kaldi model-generating recipe for the LibriSpeech acoustic corpus (Panayotov et al, 2015). Specifically, we have used the Deep Neural NetworkWeighted Finite State Transducer (DNN-WFST) hybrid with i-vector acoustic adaptation.…”
Section: Methodsmentioning
confidence: 99%
“…Maximum norm for gradient clipping was set to 5. During model training, we applied dropout (dropout rate 0.5) to the non-recurrent connections (Zaremba et al, 2014) of RNN and the hidden layers of MLPs, and applied L2 regularization (λ = 10 −4 ) on the parameters of MLPs.For the evaluation in ASR settings, we used the acoustic model trained on LibriSpeech dataset (Panayotov et al, 2015), and the language model trained on ATIS training corpus. A 2-gram language model was used during decoding.…”
mentioning
confidence: 99%
“…Speech utterances for simulating target and interference speech were picked from the Librispeech [32] dataset. They were divided into training, development, and test sets with no overlap.…”
Section: Signal Generation and Feature Extractionmentioning
confidence: 99%