Subjects were shown the terms of simple sentences in sequence (e.g., “A sparrow / is not / a vehicle”) and manually indicated whether the sentence was true or false. When the sentence form was affirmative (i.e., “X is a Y”), false sentences produced scalp potentials that were significantly more negative than those for true sentences, in the region of about 250 to 450 msec following presentation of the sentence object. In contrast, when the sentence form was negative (i.e., “X is not a Y”), it was the true statements that were associated with the ERP negativity. Since both the false‐affirmative and the true‐negative sentences consist of “mismatched” subject and object terms (e.g., sparrow / vehicle), it was concluded that the negativity in the potentials reflected a semantic mismatch between terms at a preliminary stage of sentence comprehension, rather than the falseness of the sentence taken as a whole. Similarities between the present effects of semantic mismatches and the N400 associated with incongruous sentences (Kutas & Hillyard, 1980) are discussed. The pattern of response latencies and of ERPs taken together supported a model of sentence comprehension in which negatives are dealt with only after the proposition to be negated is understood.
The purpose of this study was to examine several factors of vocal quality that might be affected by changes in vocal fold vibratory patterns. Four voice types were examined: modal, vocal fry, falsetto, and breathy. Three categories of analysis techniques were developed to extract source-related features from speech and electroglottographic (EGG) signals. Four factors were found to be important for characterizing the glottal excitations for the four voice types: the glottal pulse width, the glottal pulse skewness, the abruptness of glottal closure, and the turbulent noise component. The significance of these factors for voice synthesis was studied and a new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis. Perceptual listening tests were conducted to evaluate the auditory effects of the source model parameters upon synthesized speech. The effects of the spectral slope of the source excitation, the shape of the glottal excitation pulse, and the characteristics of the turbulent noise source were considered. Applications for these research results include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.
A&tmct-This paper is a pragmatic tutorial review of the cepstrum litemture focusing on data processing. The power, complex, and phase cepstrr~showntobeerpilyrehtedtooaemother. Roblemsrssodated with plnse unwrqphg, linear phase components, spechum notching, rlhsing, oversunpling, and extending &e data aeqnence with zeros are dixmssed. The advlntaga and di.uddpmtrges of windowing the sampled data sequence, the log s p e c t n m , and the complex cepstrum are pxesented. The iufluence of noise upon the data processing procedures is discussed throughout the paper, but is not thoroughly analyzed. The effects of nrious forms of liftering the cepstnun are desai bed. The d lx obtained by applying whitening and trend removal techniques to the spectrum prior to the alcuhtion of the cep S i N D l U e d i S C U S dWe have attempted to synthesize the results, procedures, and infor-mation~tothemauyfieldsthatareEindingcepstnunamlysis wful. In particubu we discuss the interpretation and F g of data in such areas as speech, -, and hydmacoustm. But we must caution the readex that the paper is heavily influenced by our own experiences; specirk procedures that have been found useful in one field should not be conaidered as totally general to other fields. t~h o p e d t h . t t h i s r w i e w w i l l b e o f n l u e t o t h o s e f~w i t h t h e fi~andreducethetimerequiredforthosewishingtobecome~.
We have investigated the relationship between various voice qualities and several acoustic measures made from the vowel /i/ phonated by subjects with normal voices and patients with vocal disorders. Among the patients (pathological voices), five qualities were investigated: overall severity, hoarseness, breathiness, roughness, and vocal fry. Six acoustic measures were examined. With one exception, all measures were extracted from the residue signal obtained by inverse filtering the speech signal using the linear predictive coding (LPC) technique. A formal listening test was implemented to rate each pathological voice for each vocal quality. A formal listening test also rated overall excellence of the normal voices. A scale of 1–7 was used. Multiple linear regression analysis between the results of the listening test and the various acoustic measures was used with the prediction sums of squares (PRESS) as the selection criteria. Useful prediction equations of order two or less were obtained relating certain acoustic measures and the ratings of pathological voices for each of the five qualities. The two most useful parameters for predicting vocal quality were the Pitch Amplitude (PA) and the Harmonics-to-Noise Ratio (HNR). No acoustic measure could rank the normal voices.
The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech. In part I Coarse Analysis [K. Wu and D. G. Childers, J. Acoust. Soc. Am. 90, 1828-1840 (1991)] various feature vectors and distance measures were examined to determine their appropriateness for recognizing a speaker's gender from vowels, unvoiced fricatives, and voiced fricatives. One recognition scheme based on feature vectors extracted from vowels achieved 100% correct recognition of the speaker's gender using a database of 52 speakers (27 male and 25 female). In this paper a detailed, fine analysis of the characteristics of vowels is performed, including formant frequencies, bandwidths, and amplitudes, as well as speaker fundamental frequency of voicing. The fine analysis used a pitch synchronous closed-phase analysis technique. Detailed formant features, including frequencies, bandwidths, and amplitudes, were extracted by a closed-phase weighted recursive least-squares method that employed a variable forgetting factor, i.e., WRLS-VFF. The electroglottograph signal was used to locate the closed-phase portion of the speech signal. A two-way statistical analysis of variance (ANOVA) was performed to test the differences between gender features. The relative importance of grouped vowel features was evaluated by a pattern recognition approach. Numerous interesting results were obtained, including the fact that the second formant frequency was a slightly better recognizer of gender than fundamental frequency, giving 98.1% versus 96.2% correct recognition, respectively. The statistical tests indicated that the spectra for female speakers had a steeper slope (or tilt) than that for males. The results suggest that redundant gender information was imbedded in the fundamental frequency and vocal tract resonance characteristics. The feature vectors for female voices were observed to have higher within-group variations than those for male voices. The data in this study were also used to replicate portions of the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] study of vowels for male and female speakers.
The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech segments. In this paper "coarse" acoustic coefficients (autocorrelation, linear prediction, cepstrum, and reflection) were used to form test and reference templates for vowels, voiced fricatives, and unvoiced fricatives. The effects of different distance measures, filter orders, recognition schemes, and vowels and fricatives were comparatively assessed to determine their effectiveness for the task of gender recognition from speech segments. The results showed that most of the acoustic parameters worked well for gender recognition. A within-gender and within-subject averaging technique was important for generating appropriate test and reference templates. The Euclidean distance measure appeared to be the most robust as well as the simplest of the distance measures. The results from this study implied that the gender information is time invariant, phoneme independent, and speaker independent for a given gender. One recognition scheme achieved 100% correct speaker gender classification for a database of 52 talkers (27 male and 25 female). In part II of this paper [D.G. Childers and K. Wu, J. Acoust. Soc. Am. 90, 1841-1856 (1991); hereafter referred to as paper II] the detailed features of ten vowels that appeared responsible for distinguishing a speaker's gender were examined statistically. Included in paper II is a replication of part of the classical study of Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] of vowel characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.