2006 IEEE Spoken Language Technology Workshop 2006
DOI: 10.1109/slt.2006.326839
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Vocabulary Adaptation for a daily and real-time Broadcast News Transcription System

Abstract: The daily and real-time transcription of Broadcast News (BN) is a challenging task both in acoustic and in language modeling. To achieve optimal performance, several problems have to be overcome. Particularly, when transcribing BN data in highly inflected languages, the vocabulary growth leads to high OOV word rates. To address this problem, we propose a daily vocabulary and LM adaptation framework which directly extracts new words based on contemporary written news available on the Internet and some linguisti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2008
2008
2015
2015

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 4 publications
0
11
0
Order By: Relevance
“…In [Martins et al, 2006] we proposed a procedure for dealing with the OOV problem by dynamically increasing the baseline system vocabulary, reducing the impact of linguistic differences over time. Based on the OOV analysis, we focused our work in correcting errors resulting from OOV words mainly on verbs class.…”
Section: Vocabulary Adaptation Algorithmmentioning
confidence: 99%
“…In [Martins et al, 2006] we proposed a procedure for dealing with the OOV problem by dynamically increasing the baseline system vocabulary, reducing the impact of linguistic differences over time. Based on the OOV analysis, we focused our work in correcting errors resulting from OOV words mainly on verbs class.…”
Section: Vocabulary Adaptation Algorithmmentioning
confidence: 99%
“…This is the case of the European Portuguese language, where new names contain great deal of information and occur frequently in many domains as the BN one. Additionally, due to their inflectional structure, the verbs class represents another problem to overcome [1]. For a BN transcription system like the one used in this work, the ability to correctly address new words appearing in a daily basis, is an important factor to take in consideration for its performance.…”
Section: Introductionmentioning
confidence: 99%
“…Based on texts daily available on the Web, we defined two morpho-syntatic approaches to dynamically select the target vocabulary by trading off between the OOV word rate and vocabulary size [1] [2]. Using an IR engine [3] and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based LM to the multi-pass speech recognition framework.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This is because not only those unknown words cannot be recognized correctly, but the words surrounding them will be affected. Thus, many efforts have been made to deal with the issue of OOV words (Martins et al, 2006;Galescu, 2003;Bazzi and Glass, 2001), and various model units smaller than words have been examined to recognize OOVs from speech, such as phonemes (Bazzi and Glass, 2000a), variable-length phoneme sequence (Bazzi and Glass, 2001), syllable (Bazzi and Glass, 2000b) and sub-word (Galescu, 2003). Since the proper name is a typical category of OOV words and usually takes a very large proportion among all kinds of OOV words, it has been specially addressed in (Hu et al, 2006;Tanigaki et al, 2000).…”
Section: Introductionmentioning
confidence: 99%