2015
DOI: 10.1007/978-3-319-24033-6_54
|View full text |Cite
|
Sign up to set email alerts
|

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Abstract: We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a distance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 16 publications
(14 reference statements)
0
7
0
Order By: Relevance
“…This is the same dataset that was used to train the original Wav2Letter model. The German models were trained on several corpora taken from the Bavarian Archive for Speech Signals (BAS) (Schiel, 1998;Reichel et al, 2016) as well as the dataset described in Radeck-Arneth et al (2015), which will be referred to as "RADECK" from now on. Overall, we had a total of 383 hours of training data, which is only slightly more than one third of the English corpus.…”
Section: Datasetsmentioning
confidence: 99%
“…This is the same dataset that was used to train the original Wav2Letter model. The German models were trained on several corpora taken from the Bavarian Archive for Speech Signals (BAS) (Schiel, 1998;Reichel et al, 2016) as well as the dataset described in Radeck-Arneth et al (2015), which will be referred to as "RADECK" from now on. Overall, we had a total of 383 hours of training data, which is only slightly more than one third of the English corpus.…”
Section: Datasetsmentioning
confidence: 99%
“…To our knowledge, the largest freely available corpora for German-English speech translation comprise triples for 37 hours of German audio, German transcription, and English translation (Stüker et al, 2012). Pure speech recognition data are available from 36 hours (Radeck-Arneth et al, 2015) to around 200 hours (Baumann et al, 2018). We present a corpus of sentence-aligned triples of German audio, German text, and English translation, based on German audio books.…”
Section: Introductionmentioning
confidence: 99%
“…The proposed framework achieved a word error rate of 9.2%. Radeck-Arneth et al [258,259] collected corpus in a controlled and clean environment for German distant speech. A total of 36 h of the corpus were recorded from 180 different speakers.…”
Section: Germanmentioning
confidence: 99%