Perceptual events derive their significance to an animal from their meaning about the world, that is from the information they carry about their causes. The brain should thus be able to efficiently infer the causes underlying our sensory events. Here we use multisensory cue combination to study causal inference in perception. We formulate an ideal-observer model that infers whether two sensory cues originate from the same location and that also estimates their location(s). This model accurately predicts the nonlinear integration of cues by human subjects in two auditory-visual localization tasks. The results show that indeed humans can efficiently infer the causal structure as well as the location of causes. By combining insights from the study of causal inference with the ideal-observer approach to sensory cue combination, we show that the capacity to infer causal structure is not limited to conscious, high-level cognition; it is also performed continually and effortlessly in perception.
The question of which strategy is employed in human decision making has been studied extensively in the context of cognitive tasks; however, this question has not been investigated systematically in the context of perceptual tasks. The goal of this study was to gain insight into the decision-making strategy used by human observers in a low-level perceptual task. Data from more than 100 individuals who participated in an auditory-visual spatial localization task was evaluated to examine which of three plausible strategies could account for each observer's behavior the best. This task is very suitable for exploring this question because it involves an implicit inference about whether the auditory and visual stimuli were caused by the same object or independent objects, and provides different strategies of how using the inference about causes can lead to distinctly different spatial estimates and response patterns. For example, employing the commonly used cost function of minimizing the mean squared error of spatial estimates would result in a weighted averaging of estimates corresponding to different causal structures. A strategy that would minimize the error in the inferred causal structure would result in the selection of the most likely causal structure and sticking with it in the subsequent inference of location—“model selection.” A third strategy is one that selects a causal structure in proportion to its probability, thus attempting to match the probability of the inferred causal structure. This type of probability matching strategy has been reported to be used by participants predominantly in cognitive tasks. Comparing these three strategies, the behavior of the vast majority of observers in this perceptual task was most consistent with probability matching. While this appears to be a suboptimal strategy and hence a surprising choice for the perceptual system to adopt, we discuss potential advantages of such a strategy for perception.
Recently, it has been shown that visual perception can be radically altered by signals of other modalities. For example, when a single flash is accompanied by multiple auditory beeps, it is often perceived as multiple flashes. This effect is known as the sound-induced flash illusion. In order to investigate the principles underlying this illusion, we developed an ideal observer (derived using Bayes' rule), and compared human judgements with those of the ideal observer for this task. The human observer's performance was highly consistent with that of the ideal observer in all conditions ranging from no interaction, to partial integration, to complete integration, suggesting that the rule used by the nervous system to decide when and how to combine auditory and visual signals is statistically optimal. Our findings show that the sound-induced flash illusion is an epiphenomenon of this general, statistically optimal strategy.
Subjects routinely control the vigor with which they emit motoric responses. However, the bulk of formal treatments of decision-making ignores this dimension of choice. A recent theoretical study suggested that action vigor should be influenced by experienced average reward rate and that this rate is encoded by tonic dopamine in the brain. We previously examined how average reward rate modulates vigor as exemplified by response times and found a measure of agreement with the first suggestion. In the current study, we examined the second suggestion, namely the potential influence of dopamine signaling on vigor. Ninety healthy subjects participated in a double-blind experiment in which they received one of the following: placebo, L-DOPA (which increases dopamine levels in the brain), or citalopram (which has a selective, if complex, effect on serotonin levels). Subjects performed multiple trials of a rewarded odd-ball discrimination task in which we varied the potential reward over time in order to exercise the putative link between vigor and average reward rate. Replicating our previous findings, we found that a significant fraction of the variance in subjects' responses could be explained by our experimentally manipulated changes in average reward rate. Crucially, this relationship was significantly stronger under L-Dopa than under Placebo, suggesting that the impact of average reward levels on action vigor is indeed subject to a dopaminergic influence.
Our nervous system typically processes signals from multiple sensory modalities at any given moment and is therefore posed with two important problems: which of the signals are caused by a common event, and how to combine those signals. We investigated human perception in the presence of auditory, visual, and tactile stimulation in a numerosity judgment task. Observers were presented with stimuli in one, two, or three modalities simultaneously and were asked to report their percepts in each modality. The degree of congruency between the modalities varied across trials. For example, a single flash was paired in some trials with two beeps and two taps. Cross-modal illusions were observed in most conditions in which there was incongruence among the two or three stimuli, revealing robust interactions among the three modalities in all directions. The observers' bimodal and trimodal percepts were remarkably consistent with a Bayes-optimal strategy of combining the evidence in each modality with the prior probability of the events. These findings provide evidence that the combination of sensory information among three modalities follows optimal statistical inference for the entire spectrum of conditions.
Abstract■ Two fundamental questions underlie the expression of behavior, namely what to do and how vigorously to do it. The former is the topic of an overwhelming wealth of theoretical and empirical work particularly in the fields of reinforcement learning and decisionmaking, with various forms of affective prediction error playing key roles. Although vigor concerns motivation, and so is the subject of many empirical studies in diverse fields, it has suffered a dearth of computational models. 191,[507][508][509][510][511][512][513][514][515][516][517][518][519][520] 2007] suggested that vigor should be controlled by the opportunity cost of time, which is itself determined by the average rate of reward. This coupling of reward rate and vigor can be shown to be optimal under the theory of average return reinforcement learning for a particular class of tasks but may also be a more general, perhaps hard-wired, characteristic of the architecture of control. We, therefore, tested the hypothesis that healthy human participants would adjust their RTs on the basis of the average rate of reward. We measured RTs in an odd-ball discrimination task for rewards whose magnitudes varied slowly but systematically. Linear regression on the subjectsʼ individual RTs using the time varying average rate of reward as the regressor of interest, and including nuisance regressors such as the immediate reward in a round and in the preceding round, showed that a significant fraction of the variance in subjectsʼ RTs could indeed be explained by the rate of experienced reward. This validates one of the key proposals associated with the model, illuminating an apparently mandatory form of coupling that may involve tonic levels of dopamine. ■
It has been shown that human combination of crossmodal information is highly consistent with an optimal Bayesian model performing causal inference. These findings have shed light on the computational principles governing crossmodal integration/segregation. Intuitively, in a Bayesian framework priors represent a priori information about the environment, i.e., information available prior to encountering the given stimuli, and are thus not dependent on the current stimuli. While this interpretation is considered as a defining characteristic of Bayesian computation by many, the Bayes rule per se does not require that priors remain constant despite significant changes in the stimulus, and therefore, the demonstration of Bayes-optimality of a task does not imply the invariance of priors to varying likelihoods. This issue has not been addressed before, but here we empirically investigated the independence of the priors from the likelihoods by strongly manipulating the presumed likelihoods (by using two drastically different sets of stimuli) and examining whether the estimated priors change or remain the same. The results suggest that the estimated prior probabilities are indeed independent of the immediate input and hence, likelihood.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.