Abstract-There are many different approaches to building a system that can engage in autonomous mental development. In this paper, we present an approach based on what we term self-understanding, by which we mean the explicit representation of and reasoning about what a system does and does not know, and how that knowledge changes under action. We present an architecture and a set of representations used in two robot systems that exhibit a limited degree of autonomous mental development, which we term self-extension. The contributions include: representations of gaps and uncertainty for specific kinds of knowledge, and a goal management and planning system for setting and achieving learning goals.
This article presents an integrated robot system capable of interactive learning in dialogue with a human. Such a system needs to have several competencies and must be able to process different types of representations. In this article, we describe a collection of mechanisms that enable integration of heterogeneous competencies in a principled way. Central to our design is the creation of beliefs from visual and linguistic information, and the use of these beliefs for planning system behaviour to satisfy internal drives. The system is able to detect gaps in its knowledge and to plan and execute actions that provide information needed to fill these gaps. We propose a hierarchy of mechanisms which are capable of engaging in different kinds of learning interactions, e.g. those initiated by a tutor or by the system itself. We present the theory these mechanisms are build upon and an instantiation of this theory in the form of an integrated robot system. We demonstrate the operation of the system in the case of learning conceptual models of objects and their visual properties
Abstract-In this paper we present representations and mechanisms that facilitate continuous learning of visual concepts in dialogue with a tutor and show the implemented robot system. We present how beliefs about the world are created by processing visual and linguistic information and show how they are used for planning system behaviour with the aim at satisfying its internal drive -to extend its knowledge. The system facilitates different kinds of learning initiated by the human tutor or by the system itself. We demonstrate these principles in the case of learning about object colours and basic shapes.
Abstract-We present a general method for integrating visual components into a multi-modal cognitive system. The integration is very generic and can work with an arbitrary set of modalities. We illustrate our integration approach with a specific instantiation of the architecture schema that focuses on integration of vision and language: a cognitive system able to collaborate with a human, learn and display some understanding of its surroundings. As examples of cross-modal interaction we describe mechanisms for clarification and visual learning.
Semantic visual perception for knowledge acquisition plays an important role in human cognition, as well as in the learning process of any cognitive robot. In this paper, we present a visual information abstraction mechanism designed for continuously learning robotic systems. We generate spatial information in the scene by considering plane estimation and stereo line detection coherently within a unified probabilistic framework, and show how spaces of interest (SOIs) are generated and segmented using the spatial information. We also demonstrate how the existence of SOIs is validated in the long-term learning process. The proposed mechanism facilitates robust visual information abstraction which is a requirement for continuous interactive learning. Experiments demonstrate that with the refined spatial information, our approach provides accurate and plausible representation of visual objects.
Multi-modal grounded language learning connects language predicates to physical properties of objects in the world. Sensing with multiple modalities, such as audio, haptics, and visual colors and shapes while performing interaction behaviors like lifting , dropping, and looking on objects enables a robot to ground non-visual predicates like "empty" as well as visual predicates like "red". Previous work has established that grounding in multi-modal space improves performance on object retrieval from human descriptions. In this work, we gather behavior annotations from humans and demonstrate that these improve language grounding performance by allowing a system to focus on relevant behaviors for words like "white" or "half-full" that can be understood by looking or lifting, respectively. We also explore adding modality annotations (whether to focus on audio or haptics when performing a behavior), which improves performance, and sharing information between linguistically related predicates (if "green" is a color, "white" is a color), which improves grounding recall but at the cost of precision .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.