The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.
The anatomical connectivity and intrinsic properties of entorhinal cortical neurons give rise to ordered patterns of ensemble activity. How entorhinal ensembles form, interact, and accomplish emergent processes such as memory formation is not well‐understood. We lack sufficient understanding of how neuronal ensembles in general can function transiently and distinctively from other neuronal ensembles. Ensemble interactions are bound, foremost, by anatomical connectivity and temporal constraints on neuronal discharge. We present an Overview of the structure of neuronal interactions within the entorhinal cortex and the rest of the hippocampal formation. We wish to highlight two principle features of entorhinal‐hippocampal interactions. First, large numbers of entorhinal neurons are organized into at least two distinct high‐frequency population patterns: gamma (40–100 Hz) frequency volleys and ripple (140–200 Hz) frequency volleys. These patterns occur coincident with other well‐defined electrophysiological patterns. Gamma frequency volleys are modulated by the theta cycle. Ripple frequency volleys occur on each sharp wave event. Second, these patterns occur dominantly in specific layers of the entorhinal cortex. Theta/gamma frequency volleys are the principle pattern observed in layers I–III, in the neurons that receive cortical inputs and project to the hippocampus. Ripple frequency volleys are the principle population pattern observed in layers V–VI, in the neurons that receive hippocampal output and project primarily to the neocortex. Further, we will highlight how these ensemble patterns organize interactions within distributed forebrain structures and support memory formation. Hippocampus 10:457–465, 2000 © 2000 Wiley‐Liss, Inc.
It has long been known that macaque inferior temporal (IT) neurons tend to fire more strongly to some shapes than to others, and that different IT neurons can show markedly different shape preferences. Beyond the discovery that these preferences can be elicited by features of moderate complexity, no general principle of (nonface) object recognition had emerged by which this enormous variation in selectivity could be understood. Psychophysical, as well as computational work, suggests that one such principle is the difference between viewpoint-invariant, nonaccidental (NAP) and view-dependent, metric shape properties (MPs). We measured the responses of single IT neurons to objects differing in either a NAP (namely, a change in a geon) or an MP of a single part, shown at two orientations in depth. The cells were more sensitive to changes in NAPs than in MPs, even though the image variation (as assessed by wavelet-like measures) produced by the former were smaller than the latter. The magnitude of the response modulation from the rotation itself was, on average, similar to that produced by the NAP differences, although the image changes from the rotation were much greater than that produced by NAP differences. Multidimensional scaling of the neural responses indicated a NAP/MP dimension, independent of an orientation dimension. The present results thus demonstrate that a significant portion of the neural code of IT cells represents differences in NAPs rather than MPs. This code may enable immediate recognition of novel objects at new views.
The computational model described here is driven by the hypothesis that a major function of the entorhinal cortex (EC)-hippocampal system is to alter synaptic connections in the neocortex. It is based on the following postulates:(1) The EC compares the difference between neocortical representations (primary input) and feedback information conveyed by the hippocampus (the "reconstructed input"). The difference between the primary input and the reconstructed input (termed "error") initiates plastic changes in the hippocampal networks (error compensation). (2) Comparison of the primary input and reconstructed input requires that these representations are available simultaneously in the EC network. We suggest that compensation of time delays is achieved by predictive structures, such as the CA3 recurrent network and EC-CA1 connections.(3) Alteration of intrahippocampal connections gives rise to a new hippocampal output. The hippocampus generates separated (independent) outputs, which, in turn, train long-term memory traces in the EC (independent components, IC). The ICs of the long-term memory trace are generated in a two-step manner, the operations of which we attribute to the activities of the CA3 (whitening) and CA1 (separation) fields. (4) The different hippocampal fields can perform both nonlinear and linear operations, albeit at different times (theta and sharp phases). We suggest that long-term memory is represented in a distributed and hierarchical reconstruction network, which is under the supervision of the hippocampal output. Several of these model predictions can be tested experimentally. COMPUTATIONAL ASSUMPTIONS IN THE HIPPOCAMPAL-ENTORHINAL CORTEX SYSTEMThe main goal of this chapter is to discuss how the various subnetworks of the entorhinal-hippocampal system can perform different operations depending on the "state" of the brain. We attempt to describe the entorhinal cortex (EC)-hippocampal system as a collection of structure-function relationships. Our basic assumption is that the major function subserved by the hippocampus is to develop a representation that can rehearse past events-episodes-and can make predictions about ongoing events based on previously learned temporal sequences. From this single principle and theoretical considerations, we assign symbolic (mathematical) functions to each anatomical field of the EC-hippocampal loop. We also derive that an important function of the c
In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.
We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated 'high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.