Recent research at our laboratories in the field of human auditory time perception revealed that the duration of short empty time intervals (less than approximately 200 msec) is considerably underestimated if they are immediately preceded by shorter time intervals. Within a certain range, the amount of subjective time shrinking is a monotonous function of the preceding time interval; the shorter it is, the more it shrinks its successor. In the present study, the preceding interval was kept constant at 50 msec, and the following interval, for which the duration had to be judged, varied from 40 to 280 msec. The results showed that at up to 100 msec, the perceived duration increased to a much lesser extent than did the objective duration. Beyond 120 msec, the perceived duration quickly increased and reached a veridical value at 160 msec. Such a sudden change of perceived duration in a temporal pattern in which the objective duration varies gradually indicates a typical example of categorical perception. We suggest that such a categorization of the time dimension might be a clue for processes of speech and music perception.
The effects on speech intelligibility of three different noise reduction algorithms (spectral subtraction, minimal mean squared error spectral estimation, and subspace analysis) were evaluated in two types of noise (car and babble) over a 12 dB range of signal-to-noise ratios (SNRs). Results from these listening experiments showed that most algorithms deteriorated intelligibility scores. Modeling of the results with a logit-shaped psychometric function showed that the degradation in intelligibility scores was largely congruent with a constant shift in SNR, although some additional degradation was observed at two SNRs, suggesting a limited interaction between the effects of noise suppression and SNR.
Electrical stimulation of auditory nerve fibers using cochlear implants (CI) shows psychophysical forward masking (pFM) up to several hundreds of milliseconds. By contrast, recovery of electrically evoked compound action potentials (eCAPs) from forward masking (eFM) was shown to be more rapid, with time constants no greater than a few milliseconds. These discrepancies suggested two main contributors to pFM: a rapid-recovery process due to refractory properties of the auditory nerve and a slow-recovery process arising from more central structures. In the present study, we investigate whether the use of different maskers between eCAP and psychophysical measures, specifically single-pulse versus pulse train maskers, may have been a source of confound.In experiment 1, we measured eFM using the following: a single-pulse masker, a 300-ms low-rate pulse train masker (LTM, 250 pps), and a 300-ms high-rate pulse train masker (HTM, 5000 pps). The maskers were presented either at same physical current (Φ) or at same perceptual (Ψ) level corresponding to comfortable loudness. Responses to a single-pulse probe were measured for masker-probe intervals ranging from 1 to 512 ms. Recovery from masking was much slower for pulse trains than for the single-pulse masker. When presented at Φ level, HTM produced more and longer-lasting masking than LTM. However, results were inconsistent when LTM and HTM were compared at Ψ level. In experiment 2, masked detection thresholds of single-pulse probes were measured using the same pulse train masker conditions. In line with our eFM findings, masked thresholds for HTM were higher than those for LTM at Φ level. However, the opposite result was found when the pulse trains were presented at Ψ level.Our results confirm the presence of slow-recovery phenomena at the level of the auditory nerve in CI users, as previously shown in animal studies. Inconsistencies between eFM and pFM results, despite using the same masking conditions, further underline the importance of comparing electrophysiological and psychophysical measures with identical stimulation paradigms.
All signals, except sine waves, exhibit intrinsic modulations that affect perceptual masking. Reducing the physical intrinsic modulations of a broadband signal does not necessarily have a perceptual impact: auditory filtering can reintroduce modulations. Broadband signals with low intrinsic modulations after auditory filtering have proved difficult to design. To that end, this paper introduces a class of signals termed pulse-spreading harmonic complexes (PSHCs). PSHCs are generated by summing harmonically related components with such a phase that the resulting waveform exhibits pulses equally-spaced within a repetition period. The order of a PSHC determines its pulse rate. Simulations with a gamma-tone filterbank suggest an optimal pulse rate at which, after auditory filtering, the PSHC's intrinsic modulations are lowest. These intrinsic modulations appear to be less than those for broadband pseudo-random (PR) or low-noise (LN) noise. This hypothesis was tested in a modulation-detection experiment involving five modulation rates ranging from 8 to 128 Hz and both broadband and narrowband carriers using PSHCs, PR, and LN noise. PSHC showed the lowest thresholds of all broadband signals. Results imply that optimized PSHCs exhibit less intrinsic modulations after auditory filtering than any other broadband signal previously considered.
When one very short empty time interval follows right after another, the second one can be underestimated considerably, but only if it is longer than the first one. We coined the term "time-shrinking" for this illusory phenomenon in our previous studies. Although we could relate our finding to some studies of rhythm perception, we were not able to explain the illusion. The present article presents our attempt to understand the mechanism that causes the time-shrinking. Four experiments are reported. The first one ruled out the possibility that the illusion results from a difficulty in resolving the temporal structure. The second experiment showed that the listener was not inadvertently judging the duration of the first interval instead of that of the second one. In addition, this experiment yielded more information about the time window within which the illusion occurs. The third experiment showed that forward masking of the sound markers, delimiting the empty durations, could not explain the illusion either. Furthermore, this experiment revealed a clue to the mechanism of time-shrinking: competition between expected and observed temporal positions. The fourth experiment further examined the temporal conditions that give rise to the illusion and showed that categorical perception plays a crucial role in the formation of the illusion. In the general discussion, we argue that the illusion is due to an asymmetric process of temporal assimilation.
Noise- and sine-carrier vocoders are often used to acoustically simulate the information transmitted by a cochlear implant (CI). However, sine-waves fail to mimic the broad spread of excitation produced by a CI and noise-bands contain intrinsic modulations that are absent in CIs. The present study proposes pulse-spreading harmonic complexes (PSHCs) as an alternative acoustic carrier in vocoders. Sentence-in-noise recognition was measured in 12 normal-hearing subjects for noise-, sine-, and PSHC-vocoders. Consistent with the amount of intrinsic modulations present in each vocoder condition, the average speech reception threshold obtained with the PSHC-vocoder was higher than with sine-vocoding but lower than with noise-vocoding.
Using the data presented in the accompanying paper [Hilkhuysen et al., J. Acoust. Soc. Am. 131, 531-539 (2012)], the ability of six metrics to predict intelligibility of speech in noise before and after noise suppression was studied. The metrics considered were the Speech Intelligibility Index (SII), the fractional Articulation Index (fAI), the coherence intelligibility index based on the mid-levels in speech (CSIImid), an extension of the Normalized Coherence Metric (NCM+), a part of the speech-based envelope power model (pre-sEPSM), and the Short Term Objective Intelligibility measure (STOI). Three of the measures, SII, CSIImid, and NCM+, overpredicted intelligibility after noise reduction, whereas fAI underpredicted these intelligibilities. The pre-sEPSM metric worked well for speech in babble but failed with car noise. STOI gave the best predictions, but overall the size of intelligibility prediction errors were greater than the change in intelligibility caused by noise suppression. Suggestions for improvements of the metrics are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.