1993 benchmark tests for the ARPA spoken language program

Pallett, David S.; Fiscus, Jonathan G.; Fisher, William M.; Garofolo, John S.; Lund, Bruce; Przybocki, Mark A.

doi:10.3115/1075812.1075824

Cited by 95 publications

(67 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These 83 transformation matrices were used to build bases for the EMLLR and the PARAFAC2-based model. For the adaptation test, we used the adaptation data of 8 testing speakers from the WSJ0 corpus, i.e., the November 92 NIST evaluation set [11]. We used 1 to 5 utterances from the adaptation set (an adaptation utterance was about 6 s in length).…”

Section: Methodsmentioning

confidence: 99%

Speaker Adaptation Based on PARAFAC2 of Transformation Matrices for Continuous Speech Recognition

Jeong

Lim

Kim³

et al. 2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYWe present an acoustic model adaptation method where the transformation matrix for a new speaker is given by the product of bases and a weight matrix. The bases are built from the parallel factor analysis 2 (PARAFAC2) of training speakers' transformation matrices. We perform continuous speech recognition experiments using the WSJ0 corpus. key words: maximum likelihood linear regression, parallel factor analysis, PARAFAC2, speaker adaptation, speech recognition

show abstract

Section: Methodsmentioning

confidence: 99%

Speaker Adaptation Based on PARAFAC2 of Transformation Matrices for Continuous Speech Recognition

Jeong

Lim

Kim³

et al. 2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…After working on isolated-word, speaker-dependent systems for many years, since 1992 the community has moved towards very-largevocabulary (20,000 words and more), high-perplexity ( ¡ ¥ ¤ § ¦ ) , speaker-independent, continuous speech recognition. The best system in 1994 achieved an error rate of 7.2% on read sentences drawn from North American business news (Pallett, Fiscus, et al, 1994).…”

Section: ¡ ¡ £ ¢mentioning

confidence: 99%

“…In the ARPA program, the air travel planning domain has been chosen to support evaluation of spoken language systems (Pallett, 1991;Pallett, 1992;Pallett, Dahlgren, et al, 1992;Pallett, Fisher, et al, 1990;Pallett, Fiscus, et al, 1993;Pallett, Fiscus, et al, 1994;Pallett, Fiscus, et al, 1995). Vocabularies for these systems are usually about 2000 words.…”

Section: State Of the Artmentioning

confidence: 99%

Survey of the State of the Art in Human Language Technology

Cohen¹,

Varile²,

Zampolli³

et al. 2000

Language

133

View full text Add to dashboard Cite

“…The training set consists of 60 hours of speech and the so-called WSJ test consists of 215 utterances [16]. The WSJ test is a 5k Hub test set.…”

Section: Wall Street Journalmentioning

confidence: 99%

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

Hofmann

Sakti

Hori

et al. 2012

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThe performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a "noisy" string and the goal is to recover the "clean" string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

show abstract

1993 benchmark tests for the ARPA spoken language program

Cited by 95 publications

References 16 publications

Speaker Adaptation Based on PARAFAC2 of Transformation Matrices for Continuous Speech Recognition

Speaker Adaptation Based on PARAFAC2 of Transformation Matrices for Continuous Speech Recognition

Survey of the State of the Art in Human Language Technology

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

Contact Info

Product

Resources

About