Decision Tree-Based Context Dependent Sublexical Units for Continuous Speech Recognition of Basque

Ipiña, Karmele López de; Grana, M.; Ezeiza, Nerea; Hernández, Mario; Zulueta, Ekaitz; Ezeiza, Aitzol

doi:10.1007/978-3-540-24586-5_31

Cited by 1 publication

(1 citation statement)

References 8 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The decomposed words are post-processed to produce a cleaner set of sublexical units and boundary markers are added to regenerate fullwords later after recognition. Very short units are avoided as they are usually difficult to recognize and also could harm the overall WER with more insertion errors [30,31]. To generate N -gram backoff sub-lexical LMs, different hybrid vocabularies are selected, where top-most 5k full-word forms are preserved.…”

Section: Language Model (Lm)mentioning

confidence: 99%

RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese

Shaik¹,

Tüske²,

Tahir³

et al. 2014

Interspeech 2014

View full text Add to dashboard Cite

In this paper, German, Polish, Spanish, and Portuguese large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University are presented. All the above mentioned systems for the aforementioned languages are used for the Quaero and EU-Bridge project evaluations. The LVCSR systems developed for these competitive evaluations focus on various domains like broadcast news, podcasts and lecture domain. Transcription of the speech for these tasks is challenging due to huge variability in the acoustic conditions and a significant portion of audio data includes spontaneous speech. Good improvements are obtained using stateof-the-art multilingual bottleneck features, minimum phone error trained acoustic models, language model (LM) adaptation and confusion-network based system combination. In addition, an open vocabulary approach using morphemic units is investigated along with the LM adaptation for the German LVCSR.

show abstract

Section: Language Model (Lm)mentioning

confidence: 99%