Two versions of a cascaded add, attenuate, and delay circuit were used to generate iterated rippled noise (IRN) stimuli. IRN stimuli produce a repetition pitch whose strength relative to the noise can be varied by changing the type of circuit, the attenuation, or the number of iterations in the circuit. Listeners were asked to discriminate between various pairs of IRN stimuli which differed in the type of network used to generate the sounds or the number of iterations (n = 1, 2, 3, 4, 7, and 9). Performance was determined for IRN stimuli generated with delays of 2, 4, and 8 ms and for four bandpass filter conditions (0-2000, 250-2000, 500-2000, and 750-2000 Hz). Some IRN stimuli were extremely difficult to discriminate despite relatively large spectral differences, while other IRN stimuli produced readily discriminable changes in perception, despite small spectral differences. these contrasting results are inconsistent with simple spectral explanations for the perception of IRN stimuli. An explanation based on the first peak of the autocorrelation function of IRN stimuli is consistent with the results. Simulations of the processing performed by the peripheral auditory system (i.e., interval histograms and correlograms) produce results which are consistent with the involvement of these temporal processes in the perception of IRN stimuli.
The contribution of temporal fine structure ͑TFS͒ cues to consonant identification was assessed in normal-hearing listeners with two speech-processing schemes designed to remove temporal envelope ͑E͒ cues. Stimuli were processed vowel-consonant-vowel speech tokens. Derived from the analytic signal, carrier signals were extracted from the output of a bank of analysis filters. The "PM" and "FM" processing schemes estimated a phase-and frequency-modulation function, respectively, of each carrier signal and applied them to a sinusoidal carrier at the analysis-filter center frequency. In the FM scheme, processed signals were further restricted to the analysis-filter bandwidth. A third scheme retaining only E cues from each band was used for comparison. Stimuli processed with the PM and FM schemes were found to be highly intelligible ͑50-80% correct identification͒ over a variety of experimental conditions designed to affect the putative reconstruction of E cues subsequent to peripheral auditory filtering. Analysis of confusions between consonants showed that the contribution of TFS cues was greater for place than manner of articulation, whereas the converse was observed for E cues. Taken together, these results indicate that TFS cues convey important phonetic information that is not solely a consequence of E reconstruction.
Listeners identified spoken words, letters, and numbers and the spatial location of these utterances in three listening conditions as a function of the number of simultaneously presented utterances. The three listening conditions were a normal listening condition, in which the sounds were presented over seven possible loudspeakers to a listener seated in a sound-deadened listening room; a one-headphone listening condition, in which a single microphone that was placed in the listening room delivered the sounds to a single headphone worn by the listener in a remote room; and a stationary KEMAR listening condition, in which binaural recordings from an acoustic manikin placed in the listening room were delivered to a listener in the remote room. The listeners were presented one, two, or three simultaneous utterances. The results show that utterance identification was better in the normal listening condition than in the one-headphone condition, with the KEMAR listening condition yielding intermediate levels of performance. However, the differences between listening in the normal and in the one-headphone conditions were much smaller when two, rather than three, utterances were presented at a time. Localization performance was good for both the normal and the KEMAR listening conditions and at chance for the one-headphone condition. The results suggest that binaural processing is probably more important for solving the "cocktail party" problem when there are more than two concurrent sound sources. The cocktail party effect is an extensively cited auditory phenomenon (see Blauert, 1983) that has in recent years been reformulated as a problem of sound source determination (see Yost, 1992a) or sound source segregation (see Bregman, 1990). That is, how do we determine the sources of sound in multi source acoustic conditions? Cherry's quotation suggests several variables that might contribute to a solution to this problem. Over the years, several authors (see Yost, 1992a and1992b, for a review) have added to Cherry's original list of possible solutions.
Thresholds for detecting sinusoidal amplitude modulation (AM) of a wideband noise carrier were measured as a function of the duration of the modulating signal. The carrier was either; (a) gated with a duration that exceeded the duration of modulation by the combined stimulus rise and fall times; (b) presented with a fixed duration that included a 500-ms carrier fringe preceding the onset of modulation; or (c) on continuously. In condition (a), the gated-carrier temporal modulation transfer functions (TMTFs) exhibited a bandpass characteristic. For AM frequencies above the individual subject's TMTF high-pass segment, the mean slope of the integration functions was - 7.46 dB per log unit duration. For the fringe and continuous-carrier conditions [(b) and (c)], the mean slopes of the integration functions were, respectively, - 9.30 and - 9.36 dB per log unit duration. Simulations based on integration of the output of an envelope detector approximate the results from the gated-carrier conditions. The more rapid rates of integration obtained in the fringe and continuous-carrier conditions may be due to "overintegration" where, at brief modulation durations, portions of the unmodulated carrier envelope are included in the integration of modulating signal energy.
A cascade of add, delay (d ms), and attenuate (−1≤g≤1) circuit excited with noise produces iterated rippled noise (IRN) stimuli. The matched pitch and discriminability between pairs of IRN stimuli were studied as a function of g, d, and the number of circuit iterations (n). For g≳0, the pitch of all IRN stimuli equals 1/d. For g<0, pitch depends on n: For small n, there were two pitches in the region of 1/d, while for large n there was a single pitch equal to 1/2d. Peaks in the autocorrelation function of IRN stimuli accounted for all of the results. Peaks in the autocorrelation functions for IRN stimuli indicate the number of intervals in the waveform with durations pd (p=1,2,...,n), and for g<0 intervals related to peaks near 1/md (m=odd integers) caused by assumed auditory filtering. The number of intervals (i.e., the heights of the autocorrelation peaks) determines the discriminability between IRN stimuli, while the reciprocal of the interval duration determines the matched pitch. These results support a temporal rather than a spectral account of the pitch of IRN stimuli. [Work supported by NIH.]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.