Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Bouneffouf, Djallel; Rish, Irina; Cecchi, Guillermo A.

doi:10.1007/978-3-319-63703-7_22

Cited by 12 publications

(23 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We now outlined the split models evaluated in our three settings: the MAB case with the Human-Based Thompson Sampling (HBTS) [10], the CB case with the Split Contextual Thompson Sampling (SCTS), and the RL case with the Split Q-Learning [29,32]. All three split agent classes are standardized for their parametric notions (see Table 1 for a complete parametrization and Appendix A for more literature review of these clinically-inspired reward-processing biases).…”

Section: Two-stream Split Models In Mab Cb and Rlmentioning

confidence: 99%

“…Split Multi-Armed Bandit Model. The split MAB agent is built upon Human-Based Thompson Sampling (HBTS, Algorithm 1) [10]. The positive and negative streams are each stored in the success and failure counts S a and F a .…”

Section: Two-stream Split Models In Mab Cb and Rlmentioning

confidence: 99%

“…Thus, modeling decision-making biases and traits associated with various disorders may enrich the existing computational decision-making models, leading to potentially more flexible and betterperforming algorithms. In this paper, we extended previous pursuits of human behavioral agents in MAB [10] and RL [29,30,32] into CB, built upon the Contextual Thompson Sampling (CTS) [2], a state-of-art approach to CB problem, and unfied all three levels as a parametric family of models, where the reward information is split into two streams, positive and negative.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Lin

Cecchi

Bouneffouf

et al. 2021

Human Brain and Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward. However, human decision making in real life usually involves different strategies and behavioral trajectories that lead to the same empirical outcome. Motivated by clinical literature of a wide range of neurological and psychiatric disorders, we propose here a more general and flexible parametric framework for sequential decision making that involves a two-stream reward processing mechanism. We demonstrated that this framework is flexible and unified enough to incorporate a family of problems spanning multi-armed bandits (MAB), contextual bandits (CB) and reinforcement learning (RL), which decompose the sequential decision making process in different levels. Inspired by the known reward processing abnormalities of many mental disorders, our clinically-inspired agents demonstrated interesting behavioral trajectories and comparable performance on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the PacMan game across different reward stationarities in a lifelong learning setting (The codes to reproduce all the experimental results can be accessed at https://github.com/doerlbh/mentalRL.).

show abstract

Section: Two-stream Split Models In Mab Cb and Rlmentioning

confidence: 99%

Section: Two-stream Split Models In Mab Cb and Rlmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Lin

Cecchi

Bouneffouf

et al. 2021

Human Brain and Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

show abstract

“…Homeostatic decision-making has been combined with models for hormonal control (Avila-García and Cañamero, 2005) and cognitive modulation (Bach, 2015). It also supports decision-making influenced by individual personality traits (Bouneffouf, Rish, and Cecchi, 2017). For modeling animals with multiple needs, e.g.…”

Section: Animatsmentioning

confidence: 99%

Combining Evolution and Learning in Computational Ecosystems

Strannegård

Engsner

et al. 2020

Journal of Artificial General Intelligence

View full text Add to dashboard Cite

Although animals such as spiders, fish, and birds have very different anatomies, the basic mechanisms that govern their perception, decision-making, learning, reproduction, and death have striking similarities. These mechanisms have apparently allowed the development of general intelligence in nature. This led us to the idea of approaching artificial general intelligence (AGI) by constructing a generic artificial animal (animat) with a configurable body and fixed mechanisms of perception, decision-making, learning, reproduction, and death. One instance of this generic animat could be an artificial spider, another an artificial fish, and a third an artificial bird. The goal of all decision-making in this model is to maintain homeostasis. Thus actions are selected that might promote survival and reproduction to varying degrees. All decision-making is based on knowledge that is stored in network structures. Each animat has two such network structures: a genotype and a phenotype. The genotype models the initial nervous system that is encoded in the genome (“the brain at birth”), while the phenotype represents the nervous system in its present form (“the brain at present”). Initially the phenotype and the genotype coincide, but then the phenotype keeps developing as a result of learning, while the genotype essentially remains unchanged. The model is extended to ecosystems populated by animats that develop continuously according to fixed mechanisms for sexual or asexual reproduction, and death. Several examples of simple ecosystems are given. We show that our generic animat model possesses general intelligence in a primitive form. In fact, it can learn simple forms of locomotion, navigation, foraging, language, and arithmetic.

show abstract

“…to regulate their homeostatic variables and thus survive as long as possible (Keramati and Gutkin, 2011;Yoshida, 2017). Homeostatic decision-making combines naturally with models for hormonal control (Avila-García and Cañamero, 2005), cognitive modulation (Bach, 2015), and personality traits (Bouneffouf, Rish, and Cecchi, 2017). Moreover, homeostatic agents can be naturally linked to reinforcement learning by defining reward as the difference in need status from one time to another.…”

Section: Introductionmentioning

confidence: 99%

Learning and decision-making in artificial animals

Strannegård

Svangård

Lindström

et al. 2018

Journal of Artificial General Intelligence

View full text Add to dashboard Cite

A computational model for artificial animals (animats) interacting with real or artificial ecosystems is presented. All animats use the same mechanisms for learning and decisionmaking. Each animat has its own set of needs and its own memory structure that undergoes continuous development and constitutes the basis for decision-making. The decision-making mechanism aims at keeping the needs of the animat as satisfied as possible for as long as possible. Reward and punishment are defined in terms of changes to the level of need satisfaction. The learning mechanisms are driven by prediction error relating to reward and punishment and are of two kinds: multi-objective local Q-learning and structural learning that alter the architecture of the memory structures by adding and removing nodes. The animat model has the following key properties: (1) autonomy: it operates in a fully automatic fashion, without any need for interaction with human engineers. In particular, it does not depend on human engineers to provide goals, tasks, or seed knowledge. Still, it can operate either with or without human interaction; (2) generality: it uses the same learning and decision-making mechanisms in all environments, e.g. desert environments and forest environments and for all animats, e.g. frog animats and bee animats; and (3) adequacy: it is able to learn basic forms of animal skills such as eating, drinking, locomotion, and navigation. Eight experiments are presented. The results obtained indicate that (i) dynamic memory structures are strictly more powerful than static; (ii) it is possible to use a fixed generic design to model basic cognitive processes of a wide range of animals and environments; and (iii) the animat framework enables a uniform and gradual approach to AGI, by successively taking on more challenging problems in the form of broader and more complex classes of environments

show abstract

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Cited by 12 publications

References 17 publications

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Combining Evolution and Learning in Computational Ecosystems

Learning and decision-making in artificial animals

Contact Info

Product

Resources

About