Modulating phonation through alteration of vocal fold medial surface contour

Mau, Ted; Muhlestein, Joseph; Callahan, Sean; Chan, Roger W.

doi:10.1002/lary.23451

Cited by 18 publications

(21 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is a question that many have tried to answer, and, as a result, the way in which we do injection laryngoplasty and the shape of our thyroplasty implants have evolved. 1,17,18 Titze found that a divergent prephonatory glottal configuration achieved lower oscillation threshold pressures correlating with greater ease of phonation. 3 However, other studies using physical models and numerical studies showed that the lowest phonation threshold pressure occurs for a rectangular or slightly convergent glottis.…”

Section: Discussionmentioning

confidence: 99%

“…18 Mau et al predicted that increasing vertical thickness (ie, glottal channel height) of the vocal folds and a more rectangular glottal geometry would lower phonation threshold pressure and flow. 1 Further, Courey tested injection material into different locations within the glottis and found that placement of graft in the medial aspect of the TA muscle reduced the forces necessary to bring the vocal folds into position for vibration. 17 The histology from that report also appeared to approximate a rectangular glottis.…”

Section: Discussionmentioning

confidence: 99%

“…Previous studies showed that an optimal medial surface geometry may exist, achieving the lowest phonation threshold pressure. 1–3 …”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Quantitative Evaluation of the In Vivo Vocal Fold Medial Surface Shape

2017

View full text Add to dashboard Cite

Summary Objectives/Hypothesis Glottal insufficiency is a common clinical problem in otolaryngology and medialization laryngoplasty (ML) procedures remain the primary treatment modality. Although the goal of ML is to restore physiologic glottal posture and achieve optimal phonation, this posture has not been directly measured. In this study, we assessed glottal medial surface contour changes with selective activation of the intrinsic laryngeal muscles (ILMs). Study Design Basic science study using an in vivo canine hemilarynx model. Methods In an in vivo canine hemilarynx, India ink was used to mark fleshpoints in a grid-like fashion along the medial surface of the vocal fold and ILMs were activated in a graded manner. A right-angled prism provided two views of the medial surface, which were recorded using a high-speed camera and used to reconstruct the 3D posture deformations of the medial surface. Results Thyroarytenoid (TA) muscle activation results in initial inferomedial bulging and increased glottal channel thickness and then glottal adduction with a final rectangular glottal channel shape. Lateral cricoarytenoid (LCA) activation closes the posterior glottis but final posture remains slightly convergent. Together, TA + LCA forms a rectangular glottis with an increased glottal vertical thickness. Posterior cricoarytenoid activation results in abduction and a slightly divergent glottis, whereas cricothyroid activation elongates the glottis and reduces the glottal channel vertical thickness. Conclusions A quantitative analysis of in vivo canine vocal fold medial surface upon activation of selective ILMs is provided. This may guide our therapeutic efforts during medialization laryngoplasty, as well as computational modeling of laryngeal physiology.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Quantitative Evaluation of the In Vivo Vocal Fold Medial Surface Shape

2017

View full text Add to dashboard Cite

show abstract

“…Laryngectomy is a surgical removal of the larynx for the treatment of laryngeal or other oral cavity cancers [1]; therefore, persons after laryngectomy lose their ability to produce speech sounds and suffer in their daily communication [2], [3]. Although there are several options to assist the speech communication for those patients such as esophageal speech, trachea-esophageal speech, and electrolaryngeal speech, these approaches generally produce an abnormal sounding voice with a pitch that is aberrantly low and limited in range [4].…”

Section: Introductionmentioning

confidence: 99%

Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

Kim

Cao

Mau

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Silent speech recognition (SSR) converts non-audio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent SSR approaches, however, are critical for reducing the amount of training data required from each speaker. In this paper, we investigate speaker-independent SSR from the movements of flesh points on tongue and lip with articulatory normalization methods that reduce the inter-speaker variation. To minimize the across-speaker physiological differences of the articulators, we propose Procrustes matching-based articulatory normalization by removing locational, rotational, and scaling differences. To further normalize the articulatory data, we apply feature-space maximum likelihood linear regression and i-vector. In this paper, we adopt a bidirectional long short term memory recurrent neural network (BLSTM) as an articulatory model to effectively model the articulatory movements with long-range articulatory history. A silent speech data set with flesh points was collected using an electromagnetic articulograph (EMA) from twelve healthy and two laryngectomized English speakers. Experimental results showed the effectiveness of our speaker-independent SSR approaches on healthy as well as laryngectomy speakers. In addition, BLSTM outperformed standard deep neural network. The best performance was obtained by BLSTM with all the three normalization approaches combined.

show abstract

“…People who have their larynx (vocal fold) impaired due to physical impairment or treatment of laryngeal cancer suffer during their daily life. A surgery can help these patients reconstruct or repair their larynx, but their phonation can hardly be completely recovered [1]. Patients with surgically reconstructed larynx generally have problems in controlling laryngeal functions, thus producing whispered speech with extreme hoarseness [2].…”

Section: Introductionmentioning

confidence: 99%

Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data

Cao

Kim

Mau

et al. 2016

7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016)

Self Cite

View full text Add to dashboard Cite

Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance.

show abstract

Modulating phonation through alteration of vocal fold medial surface contour

Cited by 18 publications

References 43 publications

Quantitative Evaluation of the In Vivo Vocal Fold Medial Surface Shape

Quantitative Evaluation of the In Vivo Vocal Fold Medial Surface Shape

Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network

Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data

Contact Info

Product

Resources

About