Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1124
|View full text |Cite
|
Sign up to set email alerts
|

Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks

Abstract: Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental. Acoustic landmarks are perceptually salient, even in a language one doesn't speak, and it has been demonstrated that non-speakers of the language can identify features such as the primary articulator of the landmark. These factors suggest a strategy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…As our approach addresses both, a contrastive learning task and speech recognition task, this paper is related to the field of multi-task learning [23,24]. Recent approaches to multi-task learning [25,26] solve the tasks by minimizing a loss, containing multiple terms, on the same supervised datasets. Whereas, in our method, the unsupervised and supervised losses are minimized on their respective datasets.…”
Section: Related Workmentioning
confidence: 99%
“…As our approach addresses both, a contrastive learning task and speech recognition task, this paper is related to the field of multi-task learning [23,24]. Recent approaches to multi-task learning [25,26] solve the tasks by minimizing a loss, containing multiple terms, on the same supervised datasets. Whereas, in our method, the unsupervised and supervised losses are minimized on their respective datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Landmark-based ASR has been shown to slightly reduce the WER of a large-vocabulary speech recognizer, but only in a rescoring paradigm using a very small test set [18]. Landmarks can reduce computational load for DNN/HMM hybrid models [12,13] and can improve recognition accuracy [11]. Previous works [11,12,13,19] annotated landmark positions mostly following experimental findings presented in [20,21].…”
Section: Acoustic Landmarksmentioning
confidence: 99%
“…Landmarks can reduce computational load for DNN/HMM hybrid models [12,13] and can improve recognition accuracy [11]. Previous works [11,12,13,19] annotated landmark positions mostly following experimental findings presented in [20,21]. Four different landmarks are defined to capture positions of vowel peak, glide valley in glide-like consonants, oral closure and oral release.…”
Section: Acoustic Landmarksmentioning
confidence: 99%
See 2 more Smart Citations