2022 IEEE Spoken Language Technology Workshop (SLT) 2023
DOI: 10.1109/slt54892.2023.10023085
|View full text |Cite
|
Sign up to set email alerts
|

Weak-Supervised Dysarthria-Invariant Features for Spoken Language Understanding Using an Fhvae and Adversarial Training

Abstract: The scarcity of training data and the large speaker variation in dysarthric speech lead to poor accuracy and poor speaker generalization of spoken language understanding systems for dysarthric speech. Through work on the speech features, we focus on improving the model generalization ability with limited dysarthric data. Factorized Hierarchical Variational Auto-Encoders (FHVAE) trained unsupervisedly have shown their advantage in disentangling content and speaker representations. Earlier work showed that the d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…Studies that emphasized speech patterns [23,27,[36][37][38][39][40][41][42] were interested in the formation of words spoken, omissions in their patterns, and inclusion of interesting vocabulary during discourse. These studies were much aligned toward word representation and drawing meaning out of the same by leveraging other speech-independent features, such as the speakers' emotions.…”
Section: Mode Of Meaning Extraction Usedmentioning
confidence: 99%
See 2 more Smart Citations
“…Studies that emphasized speech patterns [23,27,[36][37][38][39][40][41][42] were interested in the formation of words spoken, omissions in their patterns, and inclusion of interesting vocabulary during discourse. These studies were much aligned toward word representation and drawing meaning out of the same by leveraging other speech-independent features, such as the speakers' emotions.…”
Section: Mode Of Meaning Extraction Usedmentioning
confidence: 99%
“…The studies that used vector encoding [15,[38][39][40][41][42][43]50,51] used NLU-based models, such long short-term memory neural networks or combinations of gated recurrent unit and convolutional neural networks to achieve the tasks of dialogue assessment in dysarthric speech, language understanding, and semantic pattern tracking.…”
Section: Nature Of Speech Representations Usedmentioning
confidence: 99%
See 1 more Smart Citation