Acoustic cues related to the voice source, including harmonic structure and spectral tilt, were examined for relevance to prosodic boundary detection. The measurements considered here comprise five categories: duration, pitch, harmonic structure, spectral tilt, and amplitude. Distributions of the measurements and statistical analysis show that the measurements may be used to differentiate between prosodic categories. Detection experiments on the Boston University Radio Speech Corpus show equal error detection rates around 70% for accent and boundary detection, using only the acoustic measurements described, without any lexical or syntactic information. Further investigation of the detection results shows that duration and amplitude measurements, and, to a lesser degree, pitch measurements, are useful for detecting accents, while all voice source measurements except pitch measurements are useful for boundary detection.
This paper investigates the effectiveness of measures related to vocal tract characteristics in classifying normal and pathological speech. Unlike conventional approaches that mainly focus on features related to the vocal source, vocal tract characteristics are examined to determine if interaction effects between vocal folds and the vocal tract can be used to detect pathological speech. Especially, this paper examines features related to formant frequencies to see if vocal tract characteristics are affected by the nature of the vocal fold-related pathology. To test this hypothesis, stationary fragments of vowel /aa/ produced by 223 normal subjects, 472 vocal fold polyp subjects, and 195 unilateral vocal cord paralysis subjects are analyzed. Based on the acoustic-articulatory relationships, phonation for pathological subjects is found to be associated with measures correlated with a raised tongue body or an advanced tongue root. Vocal tract-related features are also found to be statistically significant from the Kruskal-Wallis test in distinguishing normal and pathological speech. Classification results demonstrate that combining the formant measurements with vocal fold-related features results in improved performance in differentiating vocal pathologies including vocal polyps and unilateral vocal cord paralysis, which suggests that measures related to vocal tract characteristics may provide additional information in diagnosing vocal disorders.
The perceptual relevance of adopting the temporal envelope to model the frequency band of 4–7kHz (highband) in wideband speech signal is described in this letter. Based on theoretical work in psychoacoustics, we find out that the temporal envelope can indeed be a perceptual cue for the high-band signal, i.e., a noiseless sound can be obtained if the temporal envelope is roughly preserved. Subjective listening tests verify that transparent quality can be obtained if the model is used for the 4.5–7kHz band. The proposed model has the benefits of offering flexible scalability and reducing the cost for quantization in coding applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.