Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1593
|View full text |Cite
|
Sign up to set email alerts
|

Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method

Abstract: Real-time MRI can be used to obtain videos that describe articulatory movements during running speech. For detailed analysis based on a large number of video frames, it is necessary to extract the contours of speech organs, such as the tongue, semi-automatically. The present study attempted to extract the contours of speech organs from videos using a machine learning method. First, an expert operator manually extracted the contours from the frames of a video to build training data sets. The learning operators,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 9 publications
0
6
0
Order By: Relevance
“…Previous methodologies have implemented greater automation by leveraging powerful machine learning techniques. However, these make strong assumptions about the underlying data or rely on models which are trained from a narrow sample which reduces their generalisability to new research [12][13][14][15][16]. We have demonstrated that our pipeline generalises beyond the dataset for which it was designed.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Previous methodologies have implemented greater automation by leveraging powerful machine learning techniques. However, these make strong assumptions about the underlying data or rely on models which are trained from a narrow sample which reduces their generalisability to new research [12][13][14][15][16]. We have demonstrated that our pipeline generalises beyond the dataset for which it was designed.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, those approaches which are available have not yet been demonstrated to generalise beyond the individual datasets for which they were developed. These methodologies typically involve automated or machine learning processes which are trained and tested against a narrow range of data, typically composed of a small number of speakers scanned at a single imaging centre [12][13][14][15][16][17][18]. The development of these techniques has sampled disproportionally from a single image repository [6].…”
Section: Introductionmentioning
confidence: 99%
“…Dlib [34] is widely used in facial landmark localization tasks, where it is an implementation of Ensemble of Regression Trees (ERT) presented in [35]. A recent study successfully uses Dlib to detect vocal tract landmarks [21]. During the training of the Dlib, we observe that it is important to initialize the VT landmarks carefully.…”
Section: A Baseline Methodsmentioning
confidence: 99%
“…Recently, Takemoto et al [21] utilize the Dlib machine learning library [22] to estimate the VT contours of five speech organs. Their approach performs an accurate tracking of articulatory movements on unseen data using a few manually annotated images in the training set.…”
Section: Introductionmentioning
confidence: 99%
“…Machine learning techniques were then developed with the advantage of learning from past examples, notably the "Active Shape Models" used by Labrunie et al [22]. Other machine learning techniques have been used [30], but in the last few years, deep learning, particularly Convolutional Neural Networks (CNNs), have brought significant progress in terms of performance. Initially proposed for tracking the tongue contour in ultrasound images with autoencoders [31], these techniques have been used for MRI images with the help of U-Net [32] as we did [21].…”
Section: Tracking Articulatorsmentioning
confidence: 99%