2022
DOI: 10.3390/s22041509
|View full text |Cite
|
Sign up to set email alerts
|

Improved Spoken Language Representation for Intent Understanding in a Task-Oriented Dialogue System

Abstract: Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, in practical spoken dialogue applications, the users’ articulation styles and background noises cause automatic speech recognition (ASR) errors, and these may lead language models to misclassify users’ intents. To overcome the limited performance of the intent classification task in the spoken dialogue system, we propose a novel approach that jointly uses bot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 32 publications
0
0
0
Order By: Relevance
“…To make matters worse, since most of the benchmark curated speech datasets [2][3][4][5] are built with very limited diversity, mostly representing healthy adults, it is challenging to accurately recognize the speech of children [9,10], the elderly [11][12][13], or those using dialects. Consequently, speech recognition performance suffers in highly variable scenarios, such as far-field or noisy environments [6][7][8]14,15], where the conditions or personal characteristics [16][17][18][19] degrade the performance compared with normal speech. In addition, recognizing new or trending words is important for ASR systems, but updating already built end-to-end ASR systems every time is time-consuming and resource-intensive.…”
Section: Introductionmentioning
confidence: 99%
“…To make matters worse, since most of the benchmark curated speech datasets [2][3][4][5] are built with very limited diversity, mostly representing healthy adults, it is challenging to accurately recognize the speech of children [9,10], the elderly [11][12][13], or those using dialects. Consequently, speech recognition performance suffers in highly variable scenarios, such as far-field or noisy environments [6][7][8]14,15], where the conditions or personal characteristics [16][17][18][19] degrade the performance compared with normal speech. In addition, recognizing new or trending words is important for ASR systems, but updating already built end-to-end ASR systems every time is time-consuming and resource-intensive.…”
Section: Introductionmentioning
confidence: 99%