The purpose of this study was to determine how well listeners can identify vowels based exclusively on static spectral cues. This was done by asking listeners to identify steady-state synthesized versions of 1520 vowels (76 talkers x 10 vowels x 2 repetitions) using Peterson and Barney's measured values of F0 and F1-F3 [J. Acoust. Soc. Am. 24, 175-184 (1952)]. The values for all control parameters remained constant throughout the 300-ms duration of each stimulus. A second set of 1520 signals was identical to these stimuli except that a falling pitch contour was used. The identification error rate for the flat-formant, flat-pitch signals was 27.3%, several times greater than the 5.6% error rate shown by Peterson and Barney's listeners. The introduction of a falling pitch contour resulted in a small but statistically reliable reduction in the error rate. The implications of these results for interpreting pattern recognition studies using the Peterson and Barney database are discussed. Results are also discussed in relation to the role of dynamic cues in vowel identification.
TUTORIALThe purpose of this paper is to describe a software package that can be used for performing such routine tasks as controlling listening experiments (e.g., simple labeling, discrimination, sentence intelligibility, and magnitude estimation), recording responses and response latencies, analyzing and plotting the results of those experiments, displaying instructions, and making scripted audio-recordings. The software runs under Windows and is controlled by creating text files that allow the experimenter to specify key features of the experiment such as the stimuli that are to be presented, the randomization scheme, interstimulus and intertrial intervals, the format of the output file, and the layout of response alternatives on the screen. Although the software was developed primarily with speech-perception and psychoacoustics research in mind, it has uses in other areas as well, such as written or auditory word recognition, written or auditory sentence processing, and visual perception.
A quadratic discriminant classification technique was used to classify spectral measurements from vowels spoken by men, women, and children. The parameters used to train the discriminant classifier consisted of various combinations of fundamental frequency and the three lowest formant frequencies. Several nonlinear auditory transforms were evaluated. Unlike previous studies using a linear discriminant classifier, there was no advantage in category separability for any of the nonlinear auditory transforms over a linear frequency scale, and no advantage for spectral distances over absolute frequencies. However, it was found that parameter sets using nonlinear transforms and spectral differences reduced the differences between phonetically equivalent tokens produced by different groups of talkers.
This study was designed to measure the relative contributions to speech intelligibility of spectral envelope peaks (including, but not limited to formants) versus the detailed shape of the spectral envelope. The problem was addressed by asking listeners to identify sentences and nonsense syllables that were generated by two structurally identical source-filter synthesizers, one of which constructs the filter function based on the detailed spectral envelope shape while the other constructs the filter function using a purposely coarse estimate that is based entirely on the distribution of peaks in the envelope. Viewed in the broadest terms the results showed that nearly as much speech information is conveyed by the peaks-only method as by the detail-preserving method. Just as clearly, however, every test showed some measurable advantage for spectral detail, although the differences were not large in absolute terms.
A quadratic discriminant classification technique was used to classify spectral measurements from vowels spoken by men, women, and children. The parameters used to train the discriminant classifier consisted of various combinations of fundamental frequency and the three lowest formant frequencies. Several nonlinear auditory transforms were evaluated. Unlike previous studies using a linear discriminant classifier, there was no advantage in category separability for any of the nonlinear auditory transforms over a linear frequency scale, and no advantage for spectral distances over absolute frequencies. However, it was found that parameter sets using nonlinear transforms and spectral differences reduced the differences between phonetically equivalent tokens produced by different groups of talkers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.