Grid cells in the entorhinal cortex appear to represent spatial location via a triangular coordinate system. Such cells, which have been identified in rats, bats, and monkeys, are believed to support a wide range of spatial behaviors. By recording neuronal activity from neurosurgical patients performing a virtual-navigation task we identified cells exhibiting grid-like spiking patterns in the human brain, suggesting that humans and simpler animals rely on homologous spatial-coding schemes.
In many species, spatial navigation is supported by a network of place cells that exhibit increased firing whenever an animal is in a certain region of an environment. Does this neural representation of location form part of the spatiotemporal context into which episodic memories are encoded? We recorded medial temporal lobe neuronal activity as epilepsy patients performed a hybrid spatial and episodic memory task. We identified place-responsive cells active during virtual navigation and then asked whether the same cells activated during the subsequent recall of navigation-related memories without actual navigation. Place-responsive cell activity was reinstated during episodic memory retrieval. Neuronal firing during the retrieval of each memory was similar to the activity that represented the locations in the environment where the memory was initially encoded.
Recent work has given rise to the view that reward-based decision making is governed by two key controllers: a habit system, which stores stimulus-response associations shaped by past reward, and a goal-oriented system that selects actions based on their anticipated outcomes. The current literature provides a rich body of computational theory addressing habit formation, centering on temporal-difference learning mechanisms. Less progress has been made toward formalizing the processes involved in goal-directed decision making. We draw on recent work in cognitive neuroscience, animal conditioning, cognitive and developmental psychology and machine learning, to outline a new theory of goal-directed decision making. Our basic proposal is that the brain, within an identifiable network of cortical and subcortical structures, implements a probabilistic generative model of reward, and that goal-directed decision making is effected through Bayesian inversion of this model. We present a set of simulations implementing the account, which address benchmark behavioral and neuroscientific findings, and which give rise to a set of testable predictions. We also discuss the relationship between the proposed framework and other models of decision making, including recent models of perceptual choice, to which our theory bears a direct connection.
Human behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks. We provide a mathematical account for what makes some hierarchies better than others, an account that allows an optimal hierarchy to be identified for any set of tasks. We then present results from four behavioral experiments, suggesting that human learners spontaneously discover optimal action hierarchies.
The well-known finding that responses in serial recall tend to be clustered around the position of the target item has bolstered positional-coding theories of serial order memory. In the present study, we show that this effect is confounded with another well-known finding—that responses in serial recall tend to also be clustered around the position of the prior recall (temporal clustering). The confound can be alleviated by conditioning each analysis on the positional accuracy of the previously recalled item. The revised analyses show that temporal clustering is much more prevalent in serial recall than is positional clustering. A simple associative chaining model with asymmetric neighboring, remote associations, and a primacy gradient can account for these effects. Using the same parameter values, the model produces reasonable serial position curves and captures the changes in item and order information across study-test trials. In contrast, a prominent positional coding model cannot account for the pattern of clustering uncovered by the new analyses.
Human behavior displays hierarchical structure: Simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine learning framework that extends reinforcement learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies, we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.
Research on the dynamics of reward-based, goal-directed decision making has largely focused on simple choice, where participants decide among a set of unitary, mutually exclusive options. Recent work suggests that the deliberation process underlying simple choice can be understood in terms of evidence integration: Noisy evidence in favor of each option accrues over time, until the evidence in favor of one option is significantly greater than the rest. However, real-life decisions often involve not one, but several steps of action, requiring a consideration of cumulative rewards and a sensitivity to recursive decision structure. We present results from two experiments that leveraged techniques previously applied to simple choice to shed light on the deliberation process underlying multistep choice. We interpret the results from these experiments in terms of a new computational model, which extends the evidence accumulation perspective to multiple steps of action.reward-based decision making | drift-diffusion model | reinforcement learning I magine a customer standing at the counter in an ice cream shop, deliberating among the available flavors. Such a scenario exemplifies "simple choice," a decision situation in which the objective is to select among a set of individual, immediate outcomes, each carrying a different reward. Simple choice, in this sense, has provided a convenient focus for a great deal of work in behavioral economics and decision neuroscience (1-5). However, it would be an obvious mistake to treat it as an exhaustive model of rewardbased decision making. The decisions that arise in everyday life are of course often more complicated. One important difference, among others, is that everyday decisions tend to involve sequences of actions and outcomes.As an illustration, let us return to the ice cream customer, picturing him at a point slightly earlier in the day, exiting his home in quest of something sweet. Upon reaching the sidewalk, he faces a decision between heading left toward the ice cream shop, or heading right toward a frozen yogurt shop. If he wishes to fully evaluate the relative appeal of these two options, he must answer a second set of questions: Which flavor would he choose in each shop? Furthermore, it may be relevant for him to consider more immediate consequences of the left-right decision. For example, the leftward path might pass by a bank, allowing him to deposit a check along his way, whereas the rightward path might lead by the post office, giving him the opportunity to mail a package.Rather than selecting among individual and immediate outcomes, the decision maker in this scenario finds himself at the root of a decision tree (Fig. 1A), with nodes corresponding to value-laden outcomes or states, and edges corresponding to choiceinduced state transitions. Deciding among immediate actions, even at the first branch point, requires a consideration of all of the paths that unfold below. Decision making thus assumes the form of reward-based tree search (6-10).Note that decision makin...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.