Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-405
|View full text |Cite
|
Sign up to set email alerts
|

English Conversational Telephone Speech Recognition by Humans and Machines

Abstract: One of the most difficult speech recognition tasks is accurate recognition of human to human communication. Advances in deep learning over the last few years have produced major speech recognition improvements on the representative Switchboard conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This then raises two issues -what IS human performance, and how far d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
231
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 284 publications
(239 citation statements)
references
References 31 publications
3
231
0
Order By: Relevance
“…However, while the raw technical performance of contemporary spoken language systems has improved significantly in recent years [as evidenced by corporate giants such as Microsoft and IBM continuing to issue claim and counter-claim as to whose system has the lowest word error rates (Xiong et al, 2016;Saon et al, 2017)], in reality, users' experiences with such systems are often less than satisfactory. Not only can real-world conditions (such as noisy environments, strong accents, older/younger users or nonnative speakers) lead to very poor speech recognition accuracy, but the 'understanding' exhibited by contemporary systems is rather shallow.…”
Section: Limitations Of Current Systemsmentioning
confidence: 99%
“…However, while the raw technical performance of contemporary spoken language systems has improved significantly in recent years [as evidenced by corporate giants such as Microsoft and IBM continuing to issue claim and counter-claim as to whose system has the lowest word error rates (Xiong et al, 2016;Saon et al, 2017)], in reality, users' experiences with such systems are often less than satisfactory. Not only can real-world conditions (such as noisy environments, strong accents, older/younger users or nonnative speakers) lead to very poor speech recognition accuracy, but the 'understanding' exhibited by contemporary systems is rather shallow.…”
Section: Limitations Of Current Systemsmentioning
confidence: 99%
“…Recently, the development of deep learning technologies has led to great progress in the field of automatic speech recognition (ASR). Current state-of-the-art ASR systems are approaching human recognition performance levels [1,2], when speech is recorded with a close-talking microphone. However, recognition of speech recorded by distant microphones remains challenging because of acoustic interference such as noise, reverberation and interference speakers.…”
Section: Introductionmentioning
confidence: 99%
“…Adversarial domain adaptation is suitable for the situation where no transcription or parallel adaptation data in both domains are available. It can also effectively suppress the environment [12,13,14] and speaker [15,16] variability during domain adaptation. However, in speech area, a parallel sequence of target-domain data can be easily simulated from the source-domain data such that the speech from both domains are frame-by-frame synchronized.…”
Section: Introductionmentioning
confidence: 99%