2021
DOI: 10.48550/arxiv.2108.00084
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The History of Speech Recognition to the Year 2030

Awni Hannun

Abstract: The decade from 2010 to 2020 saw remarkable improvements in automatic speech recognition. Many people now use speech recognition on a daily basis, for example to perform voice search queries, send text messages, and interact with voice assistants like Amazon Alexa and Siri by Apple. Before 2010 most people rarely used speech recognition. Given the remarkable changes in the state of speech recognition over the previous decade, what can we expect over the coming decade? I attempt to forecast the state of speech … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…Thirdly, as SSL implicitly learns a language model and other semantic information through the tasks it is subjected to solve [12], the generalizability of these models is only to the extent where data from a similar language or phonetic structure is introduced to it at finetuning. Thus, as correctly pointed out by [13], SSL for speech suffers from the problems of scale, and SSL generalizability can be improved with more efficient training procedures. Prior work for domain adaptation with self-supervised models mostly employ continued pre-training or combined data pre-training approaches [11].…”
Section: Introductionmentioning
confidence: 90%
“…Thirdly, as SSL implicitly learns a language model and other semantic information through the tasks it is subjected to solve [12], the generalizability of these models is only to the extent where data from a similar language or phonetic structure is introduced to it at finetuning. Thus, as correctly pointed out by [13], SSL for speech suffers from the problems of scale, and SSL generalizability can be improved with more efficient training procedures. Prior work for domain adaptation with self-supervised models mostly employ continued pre-training or combined data pre-training approaches [11].…”
Section: Introductionmentioning
confidence: 90%
“…Spontaneous speech analysis is a classic challenging task [9]. The conversational speech we are faced with presents specific difficulties, as it is often affected by dispersion, noise and incoherence.…”
Section: Linguistic Analysismentioning
confidence: 99%
“…Auto-generated transcripts serve an integral part in providing equitable access of online video content to a wide variety of individuals and groups while voice based assistants enable users to avail a lot of online services with voice-based commands. In the past two decades, designing efficient ASRs have been an active area of research resulting in substantial advancement in the accuracy of these tools (Hannun 2021).…”
Section: Introductionmentioning
confidence: 99%