According to complementary learning systems theory, integrating new memories into the neocortex of the brain without interfering with what is already known depends on a gradual learning process, interleaving new items with previously learned items. However, empirical studies show that information consistent with prior knowledge can sometimes be integrated very quickly. We use artificial neural networks with properties like those we attribute to the neocortex to develop an understanding of the role of consistency with prior knowledge in putatively neocortex-like learning systems, providing new insights into when integration will be fast or slow and how integration might be made more efficient when the items to be learned are hierarchically structured. The work relies on deep linear networks that capture the qualitative aspects of the learning dynamics of the more complex nonlinear networks used in previous work. The time course of learning in these networks can be linked to the hierarchical structure in the training data, captured mathematically as a set of dimensions that correspond to the branches in the hierarchy. In this context, a new item to be learned can be characterized as having aspects that project onto previously known dimensions, and others that require adding a new branch/dimension. The projection onto the known dimensions can be learned rapidly without interleaving, but learning the new dimension requires gradual interleaved learning. When a new item only overlaps with items within one branch of a hierarchy, interleaving can focus on the previously known items within this branch, resulting in faster integration with less interleaving overall. The discussion considers how the brain might exploit these facts to make learning more efficient and highlights predictions about what aspects of new information might be hard or easy to learn. This article is part of the Theo Murphy meeting issue ‘Memory reactivation: replaying events past, present and future’.
Replications are important to science, but who will do them? One proposal is that students can conduct replications as part of their training. As a proof-of-concept for this idea, here we report a series of 11 pre-registered replications of findings from the 2015 volume of Psychological Science, all conducted as part of a graduate-level course.Congruent with larger, more systematic efforts, replications typically yielded smaller effects than originals: The modal outcome was partial support for the original claim.This work documents the challenges facing motivated students as they attempt to replicate previously published results on a first attempt. We describe the workflow and pedagogical methods that were used in the class and discuss implications both for the adoption of this pedagogical model and for replication research more broadly.Keywords: Replication; Reproducibility; Pedagogy; Experimental Methods REPLICATION THROUGH PEDAGOGY 3 Improving the Replicability of Psychological Science Through PedagogyReplicability is a core value for empirical research and there is increasing concern throughout psychology that more independent replication is necessary (Open Science Collaboration, 2015; Wagenmakers, Wetzels, Borsboom, Maas, & Kievit, 2012). Yet under the current incentive structure for science, replication is not typically valued for publication in top journals (Makel, Plucker, & Hegarty, 2012) or in metrics of research productivity (Koole & Lakens, 2012). One potential solution to this problem is to make replication an explicit part of pedagogy: that is, to teach students about experimental methods by asking them to run replication studies (Frank & Saxe, 2012; Grahe et al., 2012). Despite enthusiasm for this idea (Everett & Earp, 2015; M. King et al., 2016;LeBel, 2015;Standing, 2016), there is limited data beyond anecdotal reports and individual projects (Lakens, 2013; e.g., Phillips et al., 2015) to support its efficacy in producing wide-scale pedagogical adoption.In the current article, we describe the pedagogical and methodological approach to replication research taken in our graduate-level experimental methods course and address the practical barriers faced by instructors planning to incorporate replications into their courses. In our course, students conducted replications of published articles from the 2015 volume of the journal Psychological Science with rigorous instructor review at each major stage. The results of these replications are a microcosm of larger replication efforts, providing insight into both the difficulties of pedagogical replications and their promise as a method for improving the robustness of psychological research.We assess the challenges facing a student in choosing an article of interest and -in a single attempt, within constraints of budget, expertise, and effort -reproducing the findings. We consider a number of criteria for evaluating replication success, including statistical significance, effect size, a Bayesian measure of evidence (Etz & REPLICATION THROUG...
An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework for adapting to novel tasks based on their relationship to prior tasks. We begin by learning vector representations of tasks. To adapt to new tasks, we propose metamappings, higher-order tasks that transform basic task representations. We demonstrate the effectiveness of this framework across a wide variety of tasks and computational paradigms, ranging from regression to image classification and reinforcement learning. We compare to both human adaptability and language-based approaches to zero-shot learning. Across these domains, metamapping is successful, often achieving 80 to 90% performance, without any data, on a novel task, even when the new task directly contradicts prior experience. We further show that metamapping can not only generalize to new tasks via learned relationships, but can also generalize using novel relationships unseen during training. Finally, using metamapping as a starting point can dramatically accelerate later learning on a new task and reduce learning time and cumulative error substantially. Our results provide insight into a possible computational basis of intelligent adaptability and offer a possible framework for modeling cognitive flexibility and building more flexible artificial intelligence systems.
Previous research has found that different presentations of the same concept can result in different patterns of transfer to isomorphic instances of that concept. Much of this work has framed these effects in terms of advantages and disadvantages of concreteness or abstractness. We note that mathematics is a richly structured field, with deeply interconnected concepts and many distinct aspects of understanding of each concept, and we discuss difficulties with the idea that differences among presentations can be ordered on a concrete-abstract dimension. To move beyond this, we explore how different presentations of a concept can affect learning of subsequent concepts and assess several distinct aspects of understanding. Using the domain of elementary group theory, we teach adult participants a group operation using a visuospatial or an arithmetic presentation. We then teach them concepts that build upon this operation. We demonstrate that our presentations differentially support learning complementary aspects of the system presented. We argue that these differences arise from the fact that each presentation supports learning by connecting to different systems of reasoning learners are already familiar with, and that it is these connections to extant knowledge systems, rather than differences in concreteness versus abstractness, that determine whether a presentation will be helpful. Furthermore, we show that presenting both presentations and encouraging participants to recognize the relationship between them improves performance without requiring additional time, at least for some participants. Educational Impact and Implications StatementThe details of how a concept is taught can have far-reaching effects on students' learning. Using abstract algebra with adult subjects, we show that 2 presentations of a concept that connect to different types of students' prior knowledge can have advantages and disadvantages for later learning that builds on the target concept. We show that one possible solution to the dilemma of choosing which presentation to use is giving students both concepts and explaining how they are related to each other. In summary, when designing pedagogical materials, we should consider not only how they affect learning of the present concept, but also how they support learning of future concepts, and use multiple complementary presentations rather than searching for a single ideal one.
Continuous first-person 3D environments pose unique exploration challenges to reinforcement learning (RL) agents, because of their high-dimensional state and action spaces. These challenges can be ameliorated by using semantically meaningful state abstractions to define novelty for exploration. We propose that learned representations shaped by natural language provide exactly this form of abstraction. In particular, we show that vision-language representations, when pretrained on image captioning datasets sampled from the internet, can drive meaningful, task-relevant exploration and improve performance on 3D simulated environments. We also characterize why and how language provides useful abstractions for exploration by comparing the impacts of using representations from a pretrained model, a language oracle, and several ablations. We demonstrate the benefits of our approach in two very different task domains-one that stresses the identification and manipulation of everyday objects, and one that requires navigational exploration in an expansive world-as well as two popular deep RL algorithms: Impala and R2D2. Our results suggest that using language-shaped representations could improve exploration for various algorithms and agents in challenging environments. * Equal contribution Preprint. Under review.
The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are, how they come to exist, and how a system behaves when it uses them. We begin by offering an interpretation of symbols as entities whose meaning is established by convention. But crucially, something is a symbol only for those who demonstrably and actively participate in this convention. We then outline how this interpretation thematically unifies the behavioural traits humans exhibit when they use symbols. This motivates our proposal that the field place a greater emphasis on symbolic behaviour rather than particular computational mechanisms inspired by more restrictive interpretations of symbols. Finally, we suggest that AI research explore social and cultural engagement as a tool to develop the cognitive machinery necessary for symbolic behaviour to emerge. This approach will allow for AI to interpret something as symbolic on its own rather than simply manipulate things that are only symbols to human onlookers, and thus will ultimately lead to AI with more human-like symbolic fluency.
Large language models can perform new tasks by adapting to a few in-context examples. For humans, rapid learning from examples can benefit from explanations that connect examples to task principles. We therefore investigate whether explanations of few-shot examples can allow language models to adapt more effectively. We annotate a set of 40 challenging tasks from BIG-bench collaboration (2021) with explanations of answers to a small subset of questions, as well as a variety of matched control explanations. We evaluate the effects of various zero-shot and few-shot prompts that include different types of explanations, instructions, and controls on the performance of a range of large language models. We analyze these results using statistical multilevel modeling techniques that account for the nested dependencies among conditions, tasks, prompts, and models. We find that explanations of examples can improve performance. Adding untuned explanations to a few-shot prompt offers a modest improvement in performance; about 1/3 the effect size of adding few-shot examples, but twice the effect size of task instructions. We then show that explanations tuned for performance on a small validation set offer substantially larger benefits; building a prompt by selecting examples and explanations together substantially improves performance over selecting examples alone. Hand-tuning explanations can substantially improve performance on challenging tasks. Furthermore, even untuned explanations outperform carefully matched controls, suggesting that the benefits are due to the link between an example and its explanation, rather than lower-level features of the language used. However, only large models can benefit from explanations. In summary, explanations can support the in-context learning abilities of large language models on challenging tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.