All adaptive organisms face the fundamental tradeoff between pursuing a known reward (exploitation) and sampling lesser-known options in search of something better (exploration). Theory suggests at least two strategies for solving this dilemma: a directed strategy in which choices are explicitly biased toward information seeking, and a random strategy in which decision noise leads to exploration by chance. In this work we investigated the extent to which humans use these two strategies. In our “Horizon task,” participants made explore– exploit decisions in two contexts that differed in the number of choices that they would make in the future (the time horizon). Participants were allowed to make either a single choice in each game (horizon 1), or 6 sequential choices (horizon 6), giving them more opportunity to explore. By modeling the behavior in these two conditions, we were able to measure exploration-related changes in decision making and quantify the contributions of the two strategies to behavior. We found that participants were more information seeking and had higher decision noise with the longer horizon, suggesting that humans use both strategies to solve the exploration– exploitation dilemma. We thus conclude that both information seeking and choice variability can be controlled and put to use in the service of exploration.
When making decisions on the basis of past experiences, people must rely on their memories. Human memory has many well-known biases, including the tendency to better remember highly salient events. We propose an extreme-outcome rule, whereby this memory bias leads people to overweight the largest gains and largest losses, leading to more risk seeking for relative gains than for relative losses. To test this rule, in two experiments, people repeatedly chose between fixed and risky options, where the risky option led equiprobably to more or less than did the fixed option. As was predicted, people were more risk seeking for relative gains than for relative losses. In subsequent memory tests, people tended to recall the extreme outcome first and also judged the extreme outcome as having occurred more frequently. Across individuals, risk preferences in the risky-choice task correlated with these memory biases. This extreme-outcome rule presents a novel mechanism through which memory influences decision making.
When faced with risky decisions, people tend to be risk averse for gains and risk seeking for losses (the reflection effect). Studies examining this risk-sensitive decision making, however, typically ask people directly what they would do in hypothetical choice scenarios. A recent flurry of studies has shown that when these risky decisions include rare outcomes, people make different choices for explicitly described probabilities than for experienced probabilistic outcomes. Specifically, rare outcomes are overweighted when described and underweighted when experienced. In two experiments, we examined risk-sensitive decision making when the risky option had two equally probable (50%) outcomes. For experience-based decisions, there was a reversal of the reflection effect with greater risk seeking for gains than for losses, as compared to description-based decisions. This fundamental difference in experienced and described choices cannot be explained by the weighting of rare events and suggests a separate subjective utility curve for experience.
Whether buying stocks or playing the slots, people making real-world risky decisions often rely on their experiences with the risks and rewards. These decisions, however, do not occur in isolation but are embedded in a rich context of other decisions, outcomes, and experiences. In this paper, we systematically evaluate how the local context of other rewarding outcomes alters risk preferences. Through a series of four experiments on decisions from experience, we provide evidence for an extreme-outcome rule, whereby people overweight the most extreme outcomes (highest and lowest) in a given context. As a result, people should be more risk seeking for gains than losses, even with equally likely outcomes. Across the experiments, the decision context was varied so that the same outcomes served as the high extreme, low extreme, or neither. As predicted, people were more risk seeking for relative gains, but only when the risky option potentially led to the high-extreme outcome. Similarly, people were more risk averse for relative losses, but only when the risky option potentially led to the low-extreme outcome. We conclude that in risky decisions from experience, the biggest wins and the biggest losses seem to matter more than they should.
The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.
Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally.
Pigeons and other animals sometimes deviate from optimal choice behavior when given informative signals for delayed outcomes. For example, when pigeons are given a choice between an alternative that always leads to food after a delay and an alternative that leads to food only half of the time after a delay, preference changes dramatically depending on whether the stimuli during the delays are correlated with (signal) the outcomes or not. With signaled outcomes, pigeons show a much greater preference for the suboptimal alternative than with unsignaled outcomes. Key variables and research findings related to this phenomenon are reviewed, including the effects of durations of the choice and delay periods, probability of reinforcement, and gaps in the signal. We interpret the available evidence as reflecting a preference induced by signals for good news in a context of uncertainty. Other explanations are briefly summarized and compared.
Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as behaviors arising from model-free reinforcement learning (RL) mechanisms, which typically represent the expected value associated with the possible outcomes of each action before one of those actions is chosen. Traditionally, however, habits have been understood as arising from mechanisms that are independent of outcomes. Here, we develop a computational model instantiating this traditional view, in which habits are acquired through the direct strengthening of recently taken actions, independent of outcome. We demonstrate how this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the dependence of formation rates on the reinforcement schedule. The model also explains the prevalent observation of perseveration in repeated choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors, and help to better guide research into the neural mechanisms underlying control of instrumental behaviors more generally.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.