To behave adaptively, we must learn from the consequences of our actions. Studies using event-related potentials (ERPs) have been informative with respect to the question of how such learning occurs. These studies have revealed a frontocentral negativity termed the feedback-related negativity (FRN) that appears after negative feedback. According to one prominent theory, the FRN tracks the difference between the values of actual and expected outcomes, or reward prediction errors. As such, the FRN provides a tool for studying reward valuation and decision making. We begin this review by examining the neural significance of the FRN. We then examine its functional significance. To understand the cognitive processes that occur when the FRN is generated, we explore variables that influence its appearance and amplitude. Specifically, we evaluate four hypotheses: (1) the FRN encodes a quantitative reward prediction error; (2) the FRN is evoked by outcomes and by stimuli that predict outcomes; (3) the FRN and behavior change with experience; and (4) the system that produces the FRN is maximally engaged by volitional actions.
We introduce a method for measuring the number and durations of processing stages from the electroencephalographic (EEG) signal and apply it to the study of associative recognition. Using an extension of past research that combines multivariate pattern analysis (MVPA) with hidden semi-Markov models (HSMMs), the approach identifies on a trial-by-trial basis where brief sinusoidal peaks (called bumps) are added to the ongoing EEG signal. We propose that these bumps mark the onset of critical cognitive stages in processing. The results of the analysis can be used to guide the development of detailed process models. Applied to the associative recognition task, the HSMM-MVPA method indicates that the effects of associative strength and probe type are localized to a memory retrieval stage and a decision stage. This is in line with a previously developed ACT-R process model of the task. As a test of the generalization of our method we also apply it to a data set on the Sternberg working memory task collected by Jacobs et al. (2006). The analysis generalizes robustly, and localizes the typical set size effect in a late comparison/decision stage. In addition to providing information about the number and durations of stages in associative recognition, our analysis sheds light on the ERP components implicated in the study of recognition memory.
A great deal of research focuses on how humans and animals learn from trial-and-error interactions with the environment. This research has established the viability of reinforcement learning as a model of behavioral adaptation and neural reward valuation. Error-driven learning is inefficient and dangerous, however. Fortunately, humans learn from nonexperiential sources of information as well. In the present study, we focused on one such form of information, instruction. We recorded event-related potentials as participants performed a probabilistic learning task. In one experiment condition, participants received feedback only about whether their responses were rewarded. In the other condition, they also received instruction about reward probabilities before performing the task. We found that instruction eliminated participants' reliance on feedback as evidenced by their immediate asymptotic performance in the instruction condition. In striking contrast, the feedback-related negativity, an event-related potential component thought to reflect neural reward prediction error, continued to adapt with experience in both conditions. These results show that, whereas instruction may immediately control behavior, certain neural responses must be learned from experience.R einforcement learning (RL) formalizes the notion that humans and animals learn from trial-and-error interactions with the environment (1). According to many RL models, differences between actual and expected outcomes, or reward prediction errors, provide teaching signals. These signals convey information about the magnitude and valence of the difference between actual and expected rewards. By using reward prediction errors to revise expectations, RL models increasingly select advantageous actions. Behavioral studies furnished early support for RL in the form of the "law of effect" (2). This law states that actions that are followed by rewards will be repeated. Single-cell recordings from animals provided further support by showing that responses of midbrain dopamine neurons to outcomes scale according to the differences between actual and expected rewards (3). Neuroimaging experiments have since extended this result to humans by demonstrating that blood-oxygen level-dependent (BOLD) responses in the striatum and prefrontal cortex also mirror reward prediction errors (4).On the basis of these findings, RL has emerged as a prominent theory of behavioral adaptation and neural reward valuation. As it stands, however, RL is an incomplete theory. Individuals learn from nonexperiential sources of information as well. For example, by using language to acquire knowledge about outcome likelihoods, humans can avoid costly mistakes. This raises the question, How does information provided by instruction mediate trial-and-error learning?Several theories seek to explain how the brain uses instruction and experience to select actions (5-8). These theories agree that instruction engages the prefrontal cortex and medial temporal lobes (PFC/MTL), whereas experience engages ...
When feedback follows a sequence of decisions, relationships between actions and outcomes can be difficult to learn. We used event-related potentials (ERPs) to understand how people overcome this temporal credit assignment problem. Participants performed a sequential decision task that required two decisions on each trial. The first decision led to an intermediate state that was predictive of the trial outcome, and the second decision was followed by positive or negative trial feedback. The feedback-related negativity (fERN), a component thought to reflect reward prediction error, followed negative feedback and negative intermediate states. This suggests that participants evaluated intermediate states in terms of expected future reward, and that these evaluations supported learning of earlier actions within sequences. We examine the predictions of several temporal-difference models to determine whether the behavioral and ERP results reflected a reinforcement-learning process.
In two experiments, we studied how people's strategy choices emerge through an initial and then a more considered evaluation of available strategies. The experiments employed a computer-based paradigm where participants solved multiplication problems using mental and calculator solutions. In addition to recording responses and solution times, we gathered data on mouse cursor movements. Participants' motor behavior was revealing; although people rapidly initiated movement to the calculator box or the answer input box, they frequently changed their minds and went to the other box. Movement initiation direction depended on problem difficulty and calculator responsiveness. Ultimate strategy selection also depended on these factors, but was further influenced by movement initiation direction. We conclude that strategy selection is iterative, as revealed by these differences between early cursor movement and eventual strategy implementation. After rapidly initiating movement favoring one strategy, people carefully evaluate the applicability of that strategy in the current context.The question of how people select among problem solving strategies is central to psychological research. Although this question has received substantial attention, researchers have primarily studied situations in which strategies are selected by a single irrevocable action. In many situations, however, the physical act of executing one strategy provides an opportunity to consider the wisdom of that choice. While searching for a calculator, one can instead decide to compute the tip mentally; while commuting along a congested roadway, one can consider the speed of alternate routes; while composing a spiny response to an email, one can reflect on whether it is prudent to respond at all. Might the need to second-guess oneself, and the second guess itself, be adaptive? In this paper, we explore this question by considering subtleties of the mouse-based movements that people make while selecting and implementing strategies at a computer interface. We will show that as people implement a strategy, they continue to consider whether that strategy is truly preferable. We will show that both the rapid initial selection and the further consideration of a solution method are sensitive to the relative utility of available strategies.Many formal models of human problem solving posit the existence of a strategy selection phase (Lovett & Anderson, 1996;Lovett & Schunn, 1999;Payne, Bettman, & Johnson, 1988;Payne, Johnson, Bettman, & Coupey, 1990;Schunn, Reder, Nhouyvanisvong, Richards, & Stroffolino, 1997;Siegler & Shipley, 1995). Upon reaching an impasse, the problem solver evaluates the applicability of each available strategy to the current problem. This evaluation is informed by history of strategy use; what has worked in the past is likely to work again (Lovett & Anderson, 1996;Lovett & Schunn, 1999;Siegler & Shipley, 1995). Based on the current context and past experiences, the solver attempts to identify the strategy that minimizes solution time...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.