In recent years there has been growing interest in masking that cannot be attributed to interactions in the cochlea-so--called informational masking (IM). Similarity in the acoustic properties of target and masker and uncertainty regarding the masker are the two major factors identified with IM. These factors involve quite different manipulations of signals and are believed to entail fundamentally different processes resulting in IM. Here, however, evidence is presented that these factors affect IM through their mutual influence on a single factor-the information divergence of target and masker given by Simpson-Fitter's da [Lutfi et al. (2012). J. Acoust. Soc. Am. 132, EL109-113]. Four experiments are described involving multitone pattern discrimination, multi-talker word recognition, sound-source identification, and sound localization. In each case standard manipulations of masker uncertainty and target-masker similarity (including the covariation of target-masker frequencies) are found to have the same effect on performance provided they produce the same change in da. The function relating d(') performance to da, moreover, appears to be linear with constant slope across listeners. The overriding dependence of IM on da is taken to reflect a general principle of perception that exploits differences in the statistical structure of signals to separate figure from ground.
There has been growing interest in recent years in masking that appears to have its origin at a central level of the auditory nervous system—so-called informational masking (IM). Masker uncertainty and target-masker similarity have been identified as the two major factors affecting IM; however, no theoretical framework currently exists that would give precise meaning to these terms necessary to evaluate their relative importance or model their effects. The present paper offers a first attempt at such a framework constructed within the doctrines of the theory of signal detection.
Research on hearing has long been challenged with understanding our exceptional ability to hear out individual sounds in a mixture (the so-called cocktail party problem). Two general approaches to the problem have been taken using sequences of tones as stimuli. The first has focused on our tendency to hear sequences, sufficiently separated in frequency, split into separate cohesive streams (auditory streaming). The second has focused on our ability to detect a change in one sequence, ignoring all others (auditory masking). The two phenomena are clearly related, but that relation has never been evaluated analytically. This article offers a detection-theoretic analysis of the relation between multitone streaming and masking that underscores the expected similarities and differences between these phenomena and the predicted outcome of experiments in each case. The key to establishing this relation is the function linking performance to the information divergence of the tone sequences, DKL (a measure of the statistical separation of their parameters). A strong prediction is that streaming and masking of tones will be a common function of DKL provided that the statistical properties of sequences are symmetric. Results of experiments are reported supporting this prediction.
Stimulus uncertainty is known to critically affect auditory masking, but its influence on auditory streaming has been largely ignored. Standard ABA-ABA tone sequences were made increasingly uncertain by increasing the sigma of normal distributions from which the frequency, level, or duration of tones were randomly drawn. Consistent with predictions based on a model of masking by Lutfi, Gilbertson, Chang, and Stamas [J. Acoust. Soc. Am. 134, 2160–2170 (2013)], the frequency difference for which A and B tones formed separate streams increased as a linear function of sigma in tone frequency but was much less affected by sigma in tone level or duration.
As the frequency separation of A and B tones in an ABAABA tone sequence increases the tones are heard to split into separate auditory streams (fission threshold). The phenomenon is identified with our ability to ‘hear out’ individual sound sources in natural, multisource acoustic environments. One important difference, however, between natural sounds and the tone sequences used in most streaming studies is that natural sounds often vary unpredictably from one moment to the next. In the present study, fission thresholds were measured for ABAABA tone sequences made more or less predictable by sampling the frequencies, levels or durations of the tones at random from normal distributions having different values of sigma (0–800 cents, 0–8 dB, and 0–40 ms, respectively, for frequency, level, and duration). Frequency variation on average had the greatest effect on threshold, but the function relating threshold to sigma was non-monotonic; first increasing then decreasing for the largest value of sigma. Differences in the sigmas for A and B tones tended to reduce thresholds, but covariance in the A and B tones had little effect. The results suggest that the principles of perceptual organization underlying streaming may differ for predictable and unpredictable tone sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.