2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU) 2007
DOI: 10.1109/asru.2007.4430103
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic language modeling for a daily broadcast news transcription system

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 23 publications
(12 citation statements)
references
References 14 publications
0
12
0
Order By: Relevance
“…In our case, we would like to define an automatic and optimized procedure to daily select the system vocabulary from three different corpora: an out-of-domain dataset (WEBNEWS-PT.train), an in-domain dataset (ALERT-SR.train+pilot) and the adaptation dataset daily collected from the Internet (WEBNEWS-PT.11march). For this purpose, in [Martins et al, 2007] we introduced a modified vocabulary selection technique that takes into account the differences in style across the various corpora, especially in case of written versus spoken style.…”
Section: Vocabulary Selection Using Morpho-syntactic Tagging (Pos)mentioning
confidence: 99%
See 1 more Smart Citation
“…In our case, we would like to define an automatic and optimized procedure to daily select the system vocabulary from three different corpora: an out-of-domain dataset (WEBNEWS-PT.train), an in-domain dataset (ALERT-SR.train+pilot) and the adaptation dataset daily collected from the Internet (WEBNEWS-PT.11march). For this purpose, in [Martins et al, 2007] we introduced a modified vocabulary selection technique that takes into account the differences in style across the various corpora, especially in case of written versus spoken style.…”
Section: Vocabulary Selection Using Morpho-syntactic Tagging (Pos)mentioning
confidence: 99%
“…In [Martins et al, 2007a] we proposed a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and language model to the topic of the current news segment using a multi-phase speech recognition process. Based on contemporary texts daily available on the Web, a story-based vocabulary is selected using the morphosyntactic technique described in section 4.4.…”
Section: Multi-phase Adaptation Frameworkmentioning
confidence: 99%
“…In this approach, the decoder search space is a large WFST that maps observation distributions to words. The language model (LM) in the one described in [10] with an active lexica size of 100K word. It is build based on a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news.…”
Section: Tv Broadcast News Transcription Systemmentioning
confidence: 99%
“…We have started by calculating the relative frequency value of each word in the three corpora, added these values for equal words, and selected the 100,000 words with the highest value. This extremely simple solution revealed itself effective, but there are other solutions for this problem, like morpho-syntactic analysis [12]. This selection method added 6,549 parliament transcriptions words that weren't in the initial broadcast news vocabulary.…”
Section: Vocabulary and Lexical Modelmentioning
confidence: 99%