The perception of stimuli with ramped envelopes (gradual attack and abrupt decay) and damped envelopes (abrupt attack and gradual decay) was studied in subjective and objective tasks. Magnitude estimation (ME) of perceived duration was measured for broadband noise, 1.0-kHz, and 8.0-kHz tones for durations between 10 and 200 ms. Damped sounds were judged to be shorter than ramped sounds. Matching experiments between sounds with ramped, damped, and rectangular envelopes also showed that damped sounds are perceived to be shorter than ramped sounds, and, additionally, the reason for the effect is a result of the damped sound being judged shorter than a rectangular-gated sound rather than the ramped sound being judged longer than a rectangular-gated sound. These matching studies also demonstrate that the size of the effect is larger for tones (factor of 2.0) than for broadband noise (factor of 1.5). There are two plausible explanations for the finding that damped sounds are judged to be shorter than ramped or rectangular-gated sounds: (1) the abrupt offset at a high level of the ramped sound (or a rectangular-gated sound) results in a persistence of perception (forward masking) that is considered in judgments of the subjective duration; and (2) listeners may ignore a portion of the decay of a damped sound because they consider it an "echo" [Stecker and Hafter, J. Acoust. Soc. Am. 107, 3358-3368 (2000)]. In another experiment, duration discrimination for broadband noise with ramped, damped, and rectangular envelopes was studied as a function of duration (10 to 100 ms) to determine if differences in perceived duration are associated with the size of measured Weber fractions. A forced-choice adaptive procedure was used. Duration discrimination was poorer for noise with ramped envelopes than for noise with damped or rectangular envelopes. This result is inconsistent with differences in perceived duration and no explanation was readily apparent.
The effect of frequency uncertainty on the detection of tonal signals in noise was studied using a modified probe-signal method. Widths of the listening bands used during detection were measured directly, allowing for an analysis that separates the effects of having to monitor multiple independent bands from those due to limited frequency resolution. Uncertainty was varied by beginning each trial with a cue consisting of one, two, or four randomly chosen, simultaneously presented tones. An expected signal, whose frequency matched one of the components in a cue, was presented on a majority of trials. However, on remaining trials, the signal was a probe, which meant that its frequency differed from one of the components in the cue by a constant ratio. Performance as measured in percent correct declined for probes at increasingly distant ratios from the expected values. The results were converted to dB using individual psychometric functions for expected signals and listening bands were fitted using the rounded exponential filter of Patterson et al. [J. Acoust. Soc. Am. 72, 1788-1803 (1982)]. The obtained bandwidths are comparable to those reported using notched-noise maskers, but there is a small but consistent increase in bandwidth with increased numbers of components in the cues. The primary results is that the effects due to uncertainty are well described by a 1-of-M orthogonal band model, which takes into consideration limitations of the detector, including the widths of the listening bands.
The effectiveness of two types of tonal cues for reducing frequency uncertainty was studied in a tonal detection-in-noise task. Signals varied at random from trial to trial over the range 750-3000 Hz. The three conditions included: (1) maximum uncertainty in which there were no cues; (2) minimal uncertainty in which "iconic cues" were identical to the signal to be detected; and (3) partial uncertainty in which "relative cues" were set to 2/3 of the signal frequency, i.e., at the musical 5th. Results show that relative cues and iconic cues were both effective in reducing uncertainty compared to the no-cue condition, but that performance with relative cues was poorer than with iconic cues by 1.4 dB. In addition, a modified probe-signal method was used to estimate the widths of the subjective listening bands. Application of a model of the auditory filter [R. Patterson and B. C. J. Moore, Frequency Selectivity in Hearing, edited by B. C. J. Moore (Academic, New York, 1986)] to these data showed that the subjective listening bands used with iconic cues were similar in width to typical measures of the critical band but that the bands used with relative cues were wider by a factor of roughly 1.6.
Previous studies have documented that speech with flattened or inverted fundamental frequency (F0) contours is less intelligible than speech with natural variations in F0. The purpose of this present study was to further investigate how F0 manipulations affect speech intelligibility in background noise. Speech recognition in noise was measured for sentences having the following F0 contours: unmodified, flattened at the median, natural but exaggerated, inverted, and sinusoidally frequency modulated at rates of 2.5 and 5.0 Hz, rates shown to make vowels more perceptually salient in background noise. Five talkers produced 180 stimulus sentences, with 30 unique sentences per F0 contour condition. Flattening or exaggerating the F0 contour reduced key word recognition performance by 13% relative to the naturally produced speech. Inverting or sinusoidally frequency modulating the F0 contour reduced performance by 23% relative to typically produced speech. These results support the notion that linguistically incorrect or misleading cues have a greater deleterious effect on speech understanding than linguistically neutral cues.
The association between temporal-masking patterns, duration, and loudness for broadband noises with ramped and damped envelopes was examined. Duration and loudness matches between the ramped and damped sounds differed significantly. Listeners perceived the ramped stimuli to be longer and louder than the damped stimuli, but the outcome was biased by the stimulus context. Next, temporal-masking patterns were measured for ramped-and damped-broadband noises using three ͑0.5, 1.5, and 4.0 kHz͒ 10 ms probe tones presented individually at various temporal delays. Predictions of subjective duration derived from masking results underpredicted the matching results. Loudness estimates derived from models that assume persistence of neural activity after stimulus offset ͓Glasberg B. R., and Moore, B. C. J. ͑2002͒. "A model of loudness applicable to time-varying sounds," J. Audio. Eng. Soc. 50, 331-341; Chalupper, J., and Fastl, H. ͑2002͒ "Dynamic loudness model ͑DLM͒ for normal and hearing-impaired listeners," Acust. Acta Acust. 88, 378-386͔ were greater for ramped sounds than for damped sounds and were close to the average results obtained via the matching task. Differences in simultaneous-masked thresholds for these stimuli could not account for the loudness-matching results. Decay suppression of the later occurring portion of the damped stimulus may account for the differences in perception due to the stimulus context; however, a parsimonious implementation of this process that accounts for both subjective duration and loudness judgments remains unclear.
Sounds that are equivalent in all aspects except for their temporal envelope are perceived differently. Sounds with rising temporal envelopes are perceived as louder, longer, and show a greater change in loudness throughout their duration than sounds with falling temporal envelopes. Stecker and Hafter (2000) proposed that participants ignore the decay portion of sounds with falling temporal envelopes to account for observed loudness differences, but there is no empirical evidence support this hypothesis. To test this idea, two duration-matching experiments were performed. One experiment used broadband noise and the other natural stimuli. Different groups of participants were given different instruction sets asking them to (1) simply match the duration or (2) include all aspects of the sounds. Both experiments produced the same result. The first instruction set, which represented participants' natural biases, yielded shorter subjective durations for sounds with falling temporal envelopes than for sounds with rising temporal envelopes. By contrast, asking participants to include all aspects of the sounds significantly reduced the size of the asymmetry in subjective duration, a result that supports Stecker and Hafter's hypothesis. This segregation of the stimulus at the perceptual level is consistent with observed asymmetries in loudness change and overall loudness for sounds with rising and falling temporal envelopes, but it does not account for the entire effect. The remaining portion of
Using the new table will provide more accurate estimates of the 95% critical range for successive administrations of word recognition tests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.