A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and modelbased reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.
Keywords Computational model . Model-free . Model-based . Eligibility trace . Reinforcement learningOne common theoretical framework is that value-based decision-making is realized using two distinct cognitive or learning systems: One is habitual and inflexible and requires little computation, whereas the other is deliberative and accurate and requires heavy computation (Dickinson, 1985;Kahneman, 2010;Redish, Jensen, & Johnson, 2008). In the field of instrumental learning, these two systems correspond to the model-free and model-based learning systems, respectively (Daw, Niv, & Dayan, 2005;Dolan & Dayan, 2013;Gillan, Otto, Phelps, & Daw, 2015). Prediction that is based on model-free learning is analogous to Thorndike's law of effect, in which a behavior that is followed by a pleasant outcome is likely to be repeated, whereas a behavior that is followed by an unpleasant outcome is likely to be inhibited (Thorndike, 1911). In contrast, the model-based learning system uses the agent's internal model, or cognitive map (Tolman, 1948), of a structure in the environment to dynamically change a behavior by propagating information to all states and actions, including those that have not previously been experienced. However, it has yet to be determined how humans and animals shape a preference that is based on these learning systems and how the interaction of these systems is implemented.Electronic supplementary material The online v...