2005
DOI: 10.1109/tsa.2005.852999
|View full text |Cite
|
Sign up to set email alerts
|

Automatic transcription of conversational telephone speech

Abstract: This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modelling and model training, language and pronunciation modelling are presented. These include the use of conversation side based cepstral normalisation, vocal tract length normalisation, heteroscedastic linear discriminant analysis for feature projection, Minimum Phone Error Training and s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(11 citation statements)
references
References 28 publications
0
11
0
Order By: Relevance
“…In the speech community, there are two main associations that sell valuable speech databases for research and development: they are LDC (Linguistic Data Consortium: http://www.ldc.upenn.edu/) and ELRA (European Language Resources Association: http://www.elra.info/). In [11], there is a very good review of the state of art focusing on acoustic modeling for speech recognition.…”
Section: Speech Recognitionmentioning
confidence: 99%
“…In the speech community, there are two main associations that sell valuable speech databases for research and development: they are LDC (Linguistic Data Consortium: http://www.ldc.upenn.edu/) and ELRA (European Language Resources Association: http://www.elra.info/). In [11], there is a very good review of the state of art focusing on acoustic modeling for speech recognition.…”
Section: Speech Recognitionmentioning
confidence: 99%
“…Recently, progress has been achieved in a number of particular domains of ASR including telephone speech [96], children's speech [220], noisy environments [58], speech emotion recognition [244] and meeting speech [23]. In the next subsection, we turn to the details of how an ASR system is built.…”
Section: The Scope and Variability Of Human Speechmentioning
confidence: 99%
“…There are a variety of commercial and open-source toolkits available for automated speech recognition. Several major universities focus entire programs on the research and development of these tools [10,11,3] and this work has quickly found its way into commercial development by such notable firms as Microsoft and Nuance. This work is heavily utilized (but not extended) in this paper.…”
Section: Automated Speech Recognitionmentioning
confidence: 99%
“…Potential uses of these recordings include: Natural Language Understanding (NLU) Classifier Model Training [13], Speech Recognizer Model Training [10], Emotion Detection Model Training [6], Construction of Intelligent Agents [28], Speech Application Testing, Automated Feedback Loops & Machine Learning [28].…”
Section: Introductionmentioning
confidence: 99%