Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2612
|View full text |Cite
|
Sign up to set email alerts
|

Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions

Abstract: We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words. For speaker diarization, we propose an ultrasound-based time-domain signal which we call estimated tongue activity. For word-alignment, we augment an acoustic model with low-dimensional representations of ultrasound images of the tongue, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 26 publications
(35 reference statements)
1
4
0
Order By: Relevance
“…These results meet our expectations that ultrasound and audio complement each other well and that additional out-of-domain training data is beneficial. Similar findings were reported on related tasks, such as speaker diarisation and word alignment of speech therapy sessions (Ribeiro et al, 2019b).…”
Section: Resultssupporting
confidence: 84%
See 2 more Smart Citations
“…These results meet our expectations that ultrasound and audio complement each other well and that additional out-of-domain training data is beneficial. Similar findings were reported on related tasks, such as speaker diarisation and word alignment of speech therapy sessions (Ribeiro et al, 2019b).…”
Section: Resultssupporting
confidence: 84%
“…Additionally, U-VBF can contribute to the automatic processing of speech therapy recordings. Recent work used ultrasound data to develop tongue contour extractors (Fabre et al, 2015), animate a tongue model (Fabre et al, 2017), automatically synchronise therapy recordings (Eshky et al, 2019), and for speaker diarisation and alignment of therapy sessions (Ribeiro et al, 2019b). There are, however, several challenges associated with the automatic processing of ultrasound tongue images (Stone, 2005;Ribeiro et al, 2019a).…”
Section: Ultrasound Visual Biofeedbackmentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, U-VBF can contribute to the automatic processing of speech therapy recordings. Recent work used ultrasound data to develop tongue contour extractors (Fabre et al, 2015), animate a tongue model (Fabre et al, 2017), automatically synchronise therapy recordings (Eshky et al, 2019), and for speaker diarisation and alignment of therapy sessions (Ribeiro et al, 2019b). There are, however, several challenges associated with the automatic processing of ultrasound tongue images (Stone, 2005;Ribeiro et al, 2019a).…”
Section: Ultrasound Visual Biofeedbackmentioning
confidence: 99%
“…Ultrasound tongue imaging has been used in various applications, including speech therapy [6,7,8], language learning [9,10], phonetics studies [4], and the development of silent speech interfaces [11]. Previous work in the context of speech therapy used ultrasound data to develop tongue contour extractors [12], animate a tongue model [13], and automatically synchronise and process speech therapy recordings [14,15]. Additionally, speech recognition [16,17] and speech synthesis [18,19] from ultrasound images have been used in silent speech interfaces to restore spoken communication for users with voice impairments or to allow silent communication in situations where audible speech is undesirable.…”
Section: Introductionmentioning
confidence: 99%