Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yet many complex tasks instead favor their integration. We consider a hierarchical architecture that exploits the computational efficiency of habitual control to select goals while preserving the flexibility of planning to achieve those goals. We formalize this mechanism in a reinforcement learning setting, illustrate its costs and benefits, and experimentally demonstrate its spontaneous application in a sequential decision-making task.T he distinction between habitual and planned action is fundamental to behavioral research (1-4). Habits enable computationally efficient decision making, but at the cost of behavioral flexibility. They form as stimulus-response pairings are "stamped in" following reward, as in Thorndike's law of effect (3). Planning, in contrast, enables more flexible and productive decision making. It is accomplished by first searching over a causal model linking candidate actions to their expected outcomes and then selecting actions based on their anticipated rewards. Planning imposes a severe computational cost, however, as the size and complexity of a model grows.Past research emphasizes the competition between habitual and planned control of behavior (5, 6). Habitual control is favored when an individual has extensive experience with a task and when the optimal behavior policy is relatively consistent across time; meanwhile, planning is favored for novel tasks and when the optimal policy is variable, provided that an agent represents an adequate model of their task (7).Methods of integrating habitual and planned control have received less attention (8-10), yet real-world tasks often favor elements of each. Consider, for instance, a seasoned journalist who reports on new events each day. At a high level of abstraction, her reporting is structured around a repetitive series of goal-directed actions: follow leads, interview sources, evade meddling editors, etc. Because these actions are reliably valuable for any news event, their selection is an excellent candidate for habitual control. The concrete steps necessary to carry out any individual action will be highly variable, however-optimal behavior when interviewing a pop star may be suboptimal when interviewing the Pope. Thus, the implementation of the abstract actions is an excellent candidate for planning. This example illustrates the utility of nesting elements of both habits and plans in a hierarchy of behavioral control (11-13).Indeed, it is widely recognized that humans mentally organize their behavior around hierarchically organized goals and subgoals (3,14,15). In principle, hierarchical organization can be implemented exclusively by habitual control (16), or exclusively by planning (13, 17). However, these homogenous mechanisms foreclose...
Humans often represent and reason about unrealized possible actions-the vast infinity of things that were not (or have not yet been) chosen. This capacity is central to the most impressive of human abilities: causal reasoning, planning, linguistic communication, moral judgment, etc. Nevertheless, how do we select possible actions that are worth considering from the infinity of unrealized actions that are better left ignored? We review research across the cognitive sciences, and find that the possible actions considered by default are those that are both likely to occur and generally valuable. We then offer a unified theory of why. We propose that (i) across diverse cognitive tasks, the possible actions we consider are biased towards those of general practical utility, and (ii) a plausible primary function for this mechanism resides in decision making.
Humans have a remarkable capacity for flexible decision-making, deliberating among actions by modeling their likely outcomes. This capacity allows us to adapt to the specific features of diverse circumstances. In real-world decision-making, however, people face an important challenge: There are often an enormous number of possibilities to choose among, far too many for exhaustive consideration. There is a crucial, understudied prechoice step in which, among myriad possibilities, a few good candidates come quickly to mind. How do people accomplish this? We show across nine experiments ( N = 3,972 U.S. residents) that people use computationally frugal cached value estimates to propose a few candidate actions on the basis of their success in past contexts (even when irrelevant for the current context). Deliberative planning is then deployed just within this set, allowing people to compute more accurate values on the basis of context-specific criteria. This hybrid architecture illuminates how typically valuable thoughts come quickly to mind during decision-making.
When many events contributed to an outcome, people consistently judge some more causal than others, based in part on the prior probabilities of those events. For instance, when a tree bursts into flames, people judge the lightning strike more of a cause than the presence of oxygen in the air—in part because oxygen is so common, and lightning strikes are so rare. These effects, which play a major role in several prominent theories of token causation, have largely been studied through qualitative manipulations of the prior probabilities. Yet, there is good reason to think that people’s causal judgments are on a continuum—and relatively little is known about how these judgments vary quantitatively as the prior probabilities change. In this paper, we measure people’s causal judgment across parametric manipulations of the prior probabilities of antecedent events. Our experiments replicate previous qualitative findings, and also reveal several novel patterns that are not well-described by existing theories.
Abstract:Humans often comply with social norms, but the reasons why are disputed. Here, we unify a variety of influential explanations in a common decision framework, and identify the precise cognitive variables that norms might alter to induce compliance. Specifically, we situate current theories of norm compliance within the reinforcement learning framework, which is widely used to study value-guided learning and decision-making. This framework offers an appealingly precise language to distinguish between theories, highlights the various points of convergence and divergence, and suggests novel ways in which norms might penetrate our psychology.
Natural selection designs some social behaviors to depend on flexible learning processes, whereas others are relatively rigid or reflexive. What determines the balance between these two approaches? We offer a detailed case study in the context of a two-player game with antisocial behavior and retaliatory punishment. We show that each player in this game-a "thief" and a "victim"-must balance two competing strategic interests. Flexibility is valuable because it allows adaptive differentiation in the face of diverse opponents. However, it is also risky because, in competitive games, it can produce systematically suboptimal behaviors. Using a combination of evolutionary analysis, reinforcement learning simulations, and behavioral experimentation, we show that the resolution to this tension-and the adaptation of social behavior in this game-hinges on the game's learning dynamics. Our findings clarify punishment's adaptive basis, offer a case study of the evolution of social preferences, and highlight an important connection between natural selection and learning in the resolution of social conflicts. punishment | evolution | reinforcement learning | game theory | commitment H uman social behavior is sometimes remarkably rigid, and other times remarkably flexible. A key challenge for evolutionary theory is to understand why. That is, when will natural selection favor "reflexive" social behaviors, and when will it instead favor more flexible processes that guide social decisionmaking by learning?We investigate a case study of this problem that illuminates some general principles of the evolution of social cognition. Specifically, we model the dynamic between antisocial behavior and retaliatory punishment in repeated relationships. Our goal is to understand when natural selection will favor flexibility (e.g., "try stealing and see if you can get away with it") versus rigidity ("punish thieves no matter what"). We approach this question through both a game-theoretic model of punishment and agentbased simulations that allow for the evolution of the rewards that guide learning. We demonstrate that the evolution of punishment depends on the learning dynamics of competing flexible agents, and that this interaction between learning and evolution can produce individuals with innate "social preferences," such as a taste for revenge (1-4). The Evolution of Retaliatory PunishmentIndividuals often punish those who harm them, even at a cost to themselves (5, 6). The adaptive rationale of this behavior seems clear in repeated or reputational interactions: Punishment promises a long-run gain by deterring social partners from doing future harm. This logic was classically formalized with a simple two-party repeated game (5) (Fig. 1A). On each round, a thief has the option to either steal from a victim (earning s and inflicting a cost −s) or do nothing. In response, the victim may either punish (paying a cost −c to inflict a cost −p) or do nothing. Formal analysis shows that "punish all theft/stop stealing from victims who punish" is ev...
Humans have a remarkable capacity for flexible planning, deliberating among actions by modeling their likely outcomes. This form of model-based planning allows us to adapt to the specific features of diverse circumstances. In real-world decision making, however, planning faces an important challenge: There are often an enormous number of possible actions to choose among, far too many for exhaustive consideration. There is a crucial, understudied “pre-planning” step in which, among myriad possibilities, a few good candidates come to mind with minimal effort. How do people accomplish this? We show that people use computationally frugal habits to propose a few candidate actions based on their general value across a range of contexts. Deliberative planning is then deployed just within this tractable set, updating value estimates based on context-specific features. This hybrid architecture combines the efficiency of habit and accuracy of planning, illuminating how valuable thoughts come naturally to mind during decision-making.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.