Pediatric vowel productions are notoriously hard to reliably quantify. Listener judgments and transcription are subjective, show low intra- and interrater reliability, and are influenced by listener bias. Formant estimation may offer more objectivity, but child speech is fraught with acoustic challenges. Children have less refined articulatory control and they make more production errors. They have high fundamental frequencies, wide formant bandwidths, more variable formant values, and increased subglottal coupling relative to adult speech. Historically, estimation of pediatric formants has been done manually, which is laborious and time-consuming. In recent years, automation tools have been developed to speed up the process. However, these tools have not been widely tested on pediatric speech samples. Even more critically, these tools have not been tested on diverse children. This study uses speech samples from children of color, children with disabilities, and children with both of these identities to compare three automation tools: SpeechMark® (Boyce etal., 2012), Fast Track™ (Barreda, 2021), and a custom Praat® (Boersma & Weenink, 2021) script written by the first author. Outcomes of each tool will be compared and contrasted. The discussion will review the considerations, benefits, and tradeoffs of each automation tool when working with diverse pediatric speech samples.
Speakers with voice disorders frequently report reduced intelligibility in ordinary communication situations. This effect is typically attributed to reduced loudness; however, other source/vocal tract interactions may be at work. The acoustic landmark theory of speech perception postulates that specific acoustic events, called “landmarks,” contain particularly salient information about acoustic cues used by listeners. The current study examined acoustic profiles of dysphonic speech with the publically available landmark-based automatic speech analysis software, SpeechMark™. In this study, we focused on burst landmarks, which aim to identify onsets and offsets of affricate/stop bursts. The study tested two hypotheses: (1) normal and dysphonic speech samples differ in the number of burst landmarks because laryngeal pathology affects the consistency of airflow control, and (2) the number of burst landmarks will correlate with cepstral peak prominence values, which have been shown to correlate well with perceptual judgment of dysphonia severity. Speech samples of 36 normal and 33 dysphonic speakers from KAY Elemetrics database of Disordered Voice were subjected to the analysis. Results will be discussed in the context of clinical assessment of intelligibility for dysphonic voices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.