BYBLOS speech recognition benchmark results

Kubala, Francis; Austin, S.; Barry, Chris; Makhoul, John; Placeway, Paul; Schwartyz, R.

doi:10.3115/112405.112415

Cited by 16 publications

(13 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Often a large number of mixture components are used and, since the parameters can be overtrained, contradictory results are reported on the benefits of parameter re-estimation. For example, while many researchers find it useful to reestimate all parameters of the mixture models in training, BBN reports no benefit for updating means and covariances after the initialization from clustered data [7].…”

Section: Previous Workmentioning

confidence: 99%

“…Separate sets of tied mixtures have been used for various input features including cepstra, derivatives of cepstra, and power and its derivative, where each of these feature sets have been treated as independent observation streams. Within an observation stream, different assumptions about feature correlation have been explored, with some researchers currently favoring diagonal covariance matrices [4,5] and others adopting full covariance matrices [6,7].…”

Section: Previous Workmentioning

confidence: 99%

“…The feature vectors used as input to the system are computed at 10 millisecond intervals and consist of 14 cepstral parameters, their first differences, and differenced energy (second cepstral differences are not currently used). In recognition, the SSM uses an N-best rescoring formalism to reduce computation: the BBN BYBLOS system [7] is used to generate 20 hypotheses per sentence, which are rescored by the SSM and combined with the number of phones, number of words, and (optionally) the BBN HMM score, to rerank the hypotheses. The weights for recombination are estimated on one test set and held fixed for all other test sets.…”

Section: Experimental Paradigmmentioning

confidence: 99%

See 2 more Smart Citations

On the use of tied-mixture distributions

Kimball

Ostendorf

1993

Proceedings of the Workshop on Human Language Technology - HLT '93

View full text Add to dashboard Cite

Tied-mixture (or semi-continuous) distributions are an important tool for acoustic modeling, used in many highperformance speech recognition systems today. This paper provides a survey of the work in this area, outlining the different options available for tied mixture modeling, introducing algorithms for reducing training time, and providing experimental results assessing the trade-offs for speakerindependent recognition on the Resource Management task. Additionally, we describe an extension of tied mixtures to segment-level distributions.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

Section: Experimental Paradigmmentioning

confidence: 99%

See 1 more Smart Citation

On the use of tied-mixture distributions

Kimball

Ostendorf

1993

Proceedings of the Workshop on Human Language Technology - HLT '93

View full text Add to dashboard Cite

show abstract

“…The algorithm reduces the search of more computationally expensive models, like the SSM, by eliminating very unlikely sentences in the first pass, performed with a less expensive model, such as the HMM. In this work, the BBN BYBLOS system [8] is used to generate 20 hypotheses per sentence.…”

Section: Cir Feasibilitymentioning

confidence: 99%

“…The labeler, a context-dependent SSM, took the correct orthographic transcription, a pronunciation dictionary, and the speech for a sentence and used a dynamic programming algorithm to find the best phonetic alignment. The procedure used an initial labeling produced by the BBN BYBLOS system [8] as a guide, but allowed some variation in pronunciations, according to the dictionary, as well as in segmentation. The resulting alignment is flawed in comparison with carefully hand transcribed speech, as in the TIMIT database.…”

Section: Cir Feasibilitymentioning

confidence: 99%

Recognition using classification and segmentation scoring

Kimball

Ostendorf

Rohlicek³

1992

Proceedings of the Workshop on Speech and Natural Language - HLT '91

View full text Add to dashboard Cite

Traditional statistical speech recognition systems typically make strong assumptions about the independence of observation frames and generally do not make use of segmental information. In contrast, when the segmentation is known, existing classifiers can readily accommodate segmental information in the decision process. We describe an approach to connected word recognition that allows the use of segmental information through an explicit decomposition of the recognition criterion into classification and segmentation scoring. Preliminary experiments are presented, demonstrating that the proposed framework, using fixed length sequences of cepstral feature vectors for classification of individual phonemes, performs comparably to more traditional recognition approaches that use the entire observation sequence. We expect that performance gain can be obtained using this structure with additional, more general features.

show abstract