2010
DOI: 10.1109/tasl.2009.2038807
|View full text |Cite
|
Sign up to set email alerts
|

Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(18 citation statements)
references
References 30 publications
0
18
0
Order By: Relevance
“…Building language models for morphologically rich languages (like Hungarian) is a challenging task due to data sparseness, high OOV rate and large lexicons [15]. These issues are usually handled by using subword language models instead of the standard word-based ones ( [16], [17], [18]). However, in our former studies we found that the benefit from using subword models highly depends on the amount of available training data and the speech genre of the recognition task ( [19], [20], [21]).…”
Section: Language Modelingmentioning
confidence: 99%
“…Building language models for morphologically rich languages (like Hungarian) is a challenging task due to data sparseness, high OOV rate and large lexicons [15]. These issues are usually handled by using subword language models instead of the standard word-based ones ( [16], [17], [18]). However, in our former studies we found that the benefit from using subword models highly depends on the amount of available training data and the speech genre of the recognition task ( [19], [20], [21]).…”
Section: Language Modelingmentioning
confidence: 99%
“…All of these three effects -varying word order, large vocabulary and spontaneity -hamper statistic models' ability to yield consistent estimates by high confidence. Since data sparsity issues can be often handled by estimating language models on statically derived subword units (morphs) [5,9,10] in morphologically rich languages, we extended our investigation to morph-based language models, as well.…”
Section: Introductionmentioning
confidence: 99%
“…In [7], subword RNNLMs were trained on Finnish and Estonian conversations and used for rescoring lattices generated with conventional back-off models. Subword language models have already been applied successfully for recognition of Hungarian conversational speech [10,19], but neural language models have not been used before to the best of our knowledge. We found only one mention of application of morph-based approximated RNNLMs in the first pass of an ASR system [13], however this paper did not provide morph-based ASR results.…”
Section: Introductionmentioning
confidence: 99%
“…For a comprehensive survey of different methods, see [2]. The subword vocabularies selected using unsupervised machine learning methods have been shown to perform well [3,4], so neither a morphological analyzer nor an annotated training corpus is required. One common unsupervised method is Morfessor [5], which is used as a baseline method in this work.…”
Section: Introductionmentioning
confidence: 99%