Vowels are characteristically described according to low-frequency resonance characteristics, which are presumed to provide the requisite information for identification. Classically, the study of vowel perception has focused on the lowest formant frequencies, typically F1, F2, and F3. Lehiste and Peterson [Phonetica 4, 161-177 (1959)] investigated identification accuracy of naturally produced male vowels composed of various amounts of low- and high-frequency content. Results showed near-chance identification performance for vowel segments containing only spectral information above 3.5 kHz. The authors concluded that high-frequency information was of minor importance for vowel identification. The current experiments report identification accuracy for high-pass filtered vowels produced by two male, two female, and two child talkers using both between- and within-subject designs. Identification performance was found to be significantly above chance for the majority of vowels even after high-pass filtering to remove spectral content below 3.0-3.5 kHz. Additionally, the filtered vowels having the highest fundamental frequency (child talkers) often had the highest identification accuracy scores. Linear discriminant function analysis mirrored perceptual performance when using spectral peak information between 3 and 12 kHz.
This study evaluated performance on a gender identification and temporal resolution task among active musicians and age-matched non-musicians. Brief duration (i.e., 50 and 100 ms) vowel segments produced by four adult male and four adult female speakers were spectro-temporally degraded using various parameters and presented to both groups for gender identification. Gap detection thresholds were measured using the gaps-in-noise (GIN) test. Contrary to the stated hypothesis, a significant difference in gender identification was not observed between the musician and non-musician listeners. A significant difference, however, was observed on the temporal resolution task, with the musician group achieving approximately 2 ms shorter gap detection thresholds on the GIN test compared to the non-musician counterparts. These results provide evidence supporting the potential benefits of musical training on temporal processing abilities, which have implications for the processing of speech in degraded listening environments and the enhanced processing of the fine-grained temporal aspects of the speech signal. The results also support the GIN test as an instrument sensitive to temporal processing differences among active musicians and non-musicians.
Cumulatively, the results of both experiments provide evidence that normal-hearing listeners can utilize information from the very high-frequency region (above 4 to 5 kHz) of the speech signal for talker gender identification. These findings are at variance with current assumptions regarding the perceptual information regarding talker gender within this frequency region. The current results also corroborate and extend previous studies of the use of high-frequency speech energy for perceptual tasks. These findings have potential implications for the study of information contained within the high-frequency region of the speech spectrum and the role this region may play in navigating the auditory scene, particularly when the low-frequency portion of the spectrum is masked by environmental noise sources or for listeners with substantial hearing loss in the low-frequency region and better hearing sensitivity in the high-frequency region (i.e., reverse slope hearing loss).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.