Magnetic resonance images of the vocal tract during sustained production of the fricatives/s, •, f, 0, z, 3, v, 6/by four subjects are analyzed. Measurements of vocal-tract lengths and area functions, and morphological analyses of the vocal tract and tongue shapes for these sounds are presented. Interspeaker differences in area functions are found to be greater in the pharyngeal cavity than in the buccal cavity with the nonstriden: fricatives exhibiting greater differences than the strident ones. The anterior tongue body of the alveolar stridents exhibit concave cross-sectional shapes while that of the postalveolars show a relatively raised tongue body with fiat or slightly convex cross-sectional shapes. The concave tongue shapes of the alveolars result in a more abrupt area function behind the constriction when compared to that of the postalveolars. Laminality or apicality of articulation is found to be speaker dependent. Moreover, a greater degree of anterior roedial grooving and lateral lingua-palatal contact is found in apical alveolar fricatives than in laminal ones. The posterior tongue body of all fricatives shows concave cross-sectional shapes. Voiced fricatives are characterized by larger pharyngeal volumes than the unvoiced fricatives due to tongue-root advancement. Tongue-shape asyr•metries arc found to be subject and, in some cases, sound dependent. ¸ 1995 Acoustical Society of America.
The effects of age, sex, and vocal tract configuration on the glottal excitation signal in speech are only partially understood, yet understanding these effects is important for both recognition and synthesis of speech as well as for medical purposes. In this paper, three acoustic measures related to the voice source are analyzed for five vowels from 3145 CVC utterances spoken by 335 talkers (8-39 years old) from the CID database [Miller et al., Proceedings of ICASSP, 1996, Vol. 2, pp. 849-852]. The measures are: the fundamental frequency (F0), the difference between the "corrected" (denoted by an asterisk) first two spectral harmonic magnitudes, H1* - H2* (related to the open quotient), and the difference between the "corrected" magnitudes of the first spectral harmonic and that of the third formant peak, H1* - A3* (related to source spectral tilt). The correction refers to compensating for the influence of formant frequencies on spectral magnitude estimation. Experimental results show that the three acoustic measures are dependent to varying degrees on age and vowel. Age dependencies are more prominent for male talkers, while vowel dependencies are more prominent for female talkers suggesting a greater vocal tract-source interaction. All talkers show a dependency of F0 on sex and on F3, and of H1* - A3* on vowel type. For low-pitched talkers (F0 < or = 175 Hz), H1* - H2* is positively correlated with F0 while for high-pitched talkers, H1* - H2* is dependent on F1 or vowel height. For high-pitched talkers there were no significant sex dependencies of H1* - H2* and H1* - A3*. The statistical significance of these results is shown.
This study examines relationships between external face movements, tongue movements, and speech acoustics for consonant-vowel (CV) syllables and sentences spoken by two male and two female talkers with different visual intelligibility ratings. The questions addressed are how relationships among measures vary by syllable, whether talkers who are more intelligible produce greater optical evidence of tongue movements, and how the results for CVs compared to those for sentences. Results show that the prediction of one data stream from another is better for C/a/ syllables than C/i/ and C/u/ syllables. Across the different places of articulation, lingual places result in better predictions of one data stream from another than do bilabial and glottal places. Results vary from talker to talker; interestingly, high rated intelligibility do not result in high predictions. In general, predictions for CV syllables are better than those for sentences
Magnetic resonance images of the vocal tract during the sustained phonation of /l/ (both dark and light allophones) by four native American English talkers are employed for measuring lengths, area functions, and cavity volumes and for the analysis of 3-D vocal tract and tongue shapes. Electropalatography contact profiles are used for studying inter- and intra-talker variabilities and as a source of converging evidence for the magnetic resonance imaging study. The general 3-D tongue body shapes for both allophones of /l/ are characterized by a linguo-alveolar contact together with inward lateral compression and convex cross sections of the posterior tongue body region. The lateral compression along the midsagittal plane enables the creation of flow channels along the sides of the tongue. The bilateral flow channels exhibit somewhat different areas, a characteristic which is talker-dependent. Dark /l/s show smaller pharyngeal areas than the light varieties due to tongue-root retraction and/or posterior tongue body raising. The acoustic implications of the observed geometries are discussed.
Recent advances in physiological data collection methods have made it possible to test the accuracy of predictions against speaker-specific vocal tracts and acoustic patterns. Vocal tract dimensions for /r/ derived via magnetic-resonance imaging (MRI) for two speakers of American English [Alwan, Narayanan, and Haker, J. Acoust. Soc. Am. 101, 1078-1089 (1997)] were used to construct models of the acoustics of /r/. Because previous models have not sufficiently accounted for the very low F3 characteristic of /r/, the aim was to match formant frequencies predicted by the models to the full range of formant frequency values produced by the speakers in recordings of real words containing /r/. In one set of experiments, area functions derived from MRI data were used to argue that the Perturbation Theory of tube acoustics cannot adequately account for /r/, primarily because predicted locations did not match speakers' actual constriction locations. Different models of the acoustics of /r/ were tested using the Maeda computer simulation program [Maeda, Speech Commun. 1, 199-299 (1982)]; the supralingual vocal-tract dimensions reported in Alwan et al. were found to be adequate at predicting only the highest of attested F3 values. By using (1) a recently developed adaptation of the Maeda model that incorporates the sublingual space as a side branch from the front cavity, and by including (2) the sublingual space as an increment to the dimensions of the front cavity, the mid-to-low values of the speakers' F3 range were matched. Finally, a simple tube model with dimensions derived from MRI data was developed to account for cavity affiliations. This confirmed F3 as a front cavity resonance, and variations in F1, F2, and F4 as arising from mid- and back-cavity geometries. Possible trading relations for F3 lowering based on different acoustic mechanisms for extending the front cavity are also proposed.
Acoustic feedback is a problem in hearing aids that contain a substantial amount of gain, hearing aids that are used in conjunction with vented or open molds, and in-the-ear hearing aids. Acoustic feedback is both annoying and reduces the maximum usable gain of hearing-aid devices. This paper studies analytically the steady-state convergence behavior of LMS-based adaptive algorithms when used in continuous adaptation to reduce acoustic feedback. A bias is found in the adaptive filter's estimate of the hearing-aid acoustic feedback path. Methods for reducing this bias and producing an improved estimate of the acoustic feedback path are analyzed and compared. It is shown that by the use of a delay in the forward or cancellation paths of the hearing aid plant, and for representative feedback paths, it is possible to reduce this bias by more than 15 dB.
Magnetic resonance images of the vocal tract during sustained production of [symbol: see text] by four native American English talkers are employed for measuring vocal-tract dimensions and for morphological analysis of the 3D vocal tract and tongue shapes. Electropalatography contact profiles are used for studying inter- and intra-talker variabilities. The vocal tract during the production of [symbol: see text] appears to be characterized by three cavities due to the presence of two supraglottal constrictions: the primary one in the oral cavity, and a secondary one in the pharyngeal cavity. All subjects show a large volume anterior to the oral constriction, which results from an inward-drawn tongue body, an anterior tongue body that is characterized by convex cross sections, and a concave posterior tongue body shape. Inter-subject variabilities are observed in the oral-constriction location and the way the constriction is formed. No systematic differences are found between the 3-D vocal tract and tongue shapes of word-initial and syllabic [symbol: see text]s. Tongue-shaping mechanisms for these sounds and their acoustic implications are discussed.
The great majority of current voice technology applications rely on acoustic features, such as the widely used MFCC or LP parameters, which characterize the vocal tract response. Nonetheless, the major source of excitation, namely the glottal flow, is expected to convey useful complementary information. The glottal flow is the airflow passing through the vocal folds at the glottis. Unfortunately, glottal flow analysis from speech recordings requires specific and complex processing operations, which explains why it has been generally avoided. This paper gives a comprehensive overview of techniques for glottal source processing. Starting from analysis tools for pitch tracking, detection of glottal closure instant, estimation and modeling of glottal flow, this paper discusses how these tools and techniques might be properly integrated in various voice technology applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.