Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech

Ruthven, Matthieu; Miquel, Marc E.; King, Andrew P.

doi:10.1016/j.cmpb.2020.105814

Cited by 13 publications

(20 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed framework includes a deep-learning-based method to estimate segmentations of the following six anatomical features in the image pair: the head, soft palate, jaw, tongue, vocal tract and tooth space. This method is described in [56] and consists of two steps. First, segmentations of the six anatomical features in the image pair are estimated using a pre-trained CNN.…”

Section: Methodsmentioning

confidence: 99%

“…Second, a connected-component-based post-processing step is performed to remove anatomically impossible regions from the segmentations. For full information about the segmentation method, the reader is referred to [56] .…”

Section: Methodsmentioning

confidence: 99%

“…Several methods to segment articulators in dynamic 2D MR images of the vocal tract during speech have been developed [54] , [55] , [56] , [57] , [58] , [59] , [60] , [61] , [62] , [63] , [64] . However, only one of these fully segments several groups of articulators in the images [56] .…”

Section: Introductionmentioning

confidence: 99%

“…A metric based on velopharyngeal closure has been proposed and used to evaluate the accuracy of a method to segment dynamic 2D MR images of the vocal tract during speech [56] . This metric quantifies how many of the velopharyngeal closures in the ground-truth (GT) segmentations occur in the estimated segmentations, and is calculated by comparing corresponding consecutive segmentations in the two series.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech

Ruthven

Miquel²,

King³

2023

Biomedical Signal Processing and Control

Self Cite

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech

Ruthven

Miquel²,

King³

2023

Biomedical Signal Processing and Control

Self Cite

View full text Add to dashboard Cite

“…Furthermore, those approaches which are available have not yet been demonstrated to generalise beyond the individual datasets for which they were developed. These methodologies typically involve automated or machine learning processes which are trained and tested against a narrow range of data, typically composed of a small number of speakers scanned at a single imaging centre [12][13][14][15][16][17][18]. The development of these techniques has sampled disproportionally from a single image repository [6].…”

Section: Introductionmentioning

confidence: 99%

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Belyk¹,

Carignan²,

McGettigan³

2021

Preprint

View full text Add to dashboard Cite

Real-time magnetic resonance imaging is a technique that provides high contrast videographic data of the vocal tract that allow researchers to observe the internal structures that shape the sounds of speech. However, structural features need to be extracted from these vocal tract images to make them useful to researchers. We have developed a semi-automated processing pipeline that produces outlines of the vocal tract to quantify vocal tract morphology. Our approach uses simple tissue classification constrained to pixels that analysts have identified as likely to contain the vocal tract and surrounding tissue. This approach is supplemented with multiple opportunities for the analyst to intervene in order to ensure that outputs are robust to errors. Although this approach is more labour intensive than more fully automated alternatives, these costs are offset by the benefits of improving the quality of measurements. We demonstrate that this pipeline can be generalised to a range of datasets and that it remains reliable across analysts, particularly among analysts with vocal tract expertise. The pipeline’s reliance on user input presents a challenge to scalability if applied to very large. Measurements produced by this pipeline could be provide a broader scope of training data for fully automated methods in an effort to improve their generalisability.

show abstract

Tongue model construction based on ultrasound images with image processing and deep learning method

2022

View full text Add to dashboard Cite

Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech

Cited by 13 publications

References 28 publications

A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech

A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech

An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images

Tongue model construction based on ultrasound images with image processing and deep learning method

Contact Info

Product

Resources

About