Reinforcement learning involves decision-making in dynamic and uncertain environments and constitutes a crucial element of artificial intelligence. In our previous work, we experimentally demonstrated that the ultrafast chaotic oscillatory dynamics of lasers can be used to efficiently solve the two-armed bandit problem, which requires decision-making concerning a class of difficult trade-offs called the exploration–exploitation dilemma. However, only two selections were employed in that research; hence, the scalability of the laser-chaos-based reinforcement learning should be clarified. In this study, we demonstrated a scalable, pipelined principle of resolving the multi-armed bandit problem by introducing time-division multiplexing of chaotically oscillated ultrafast time series. The experimental demonstrations in which bandit problems with up to 64 arms were successfully solved are presented where laser chaos time series significantly outperforms quasiperiodic signals, computer-generated pseudorandom numbers, and coloured noise. Detailed analyses are also provided that include performance comparisons among laser chaos signals generated in different physical conditions, which coincide with the diffusivity inherent in the time series. This study paves the way for ultrafast reinforcement learning by taking advantage of the ultrahigh bandwidths of light wave and practical enabling technologies.
We investigate the effect of a memory parameter on the performance of adaptive decision making using a tug-of-war method with the chaotic oscillatory dynamics of a semiconductor laser. We experimentally generate chaotic temporal waveforms of the semiconductor laser with optical feedback and apply them for adaptive decision making in solving a multiarmed bandit problem that aims at maximizing the total reward from slot machines whose hit probabilities are dynamically switched. We examine the dependence of making correct decisions on different values of the memory parameter. The degree of adaptivity is found to be enhanced with a smaller memory parameter, whereas the degree of convergence to the correct decision is higher for a larger memory parameter. The relations among the adaptivity, environmental changes, and the difficulties of the problem are also discussed considering the requirement of past decisions. This examination of ultrafast adaptive decision making highlights the importance of memorizing past events and paves the way for future photonic intelligence.
Photonic technologies are promising for solving complex tasks in artificial intelligence. In this paper, we numerically investigate decision making for solving the multi-armed bandit problem using lag synchronization of chaos in a ring laser-network configuration. We construct a laser network consisting of unidirectionally coupled semiconductor lasers, whereby spontaneous exchange of the leader-laggard relationship in the lag synchronization of chaos is observed. We succeed in solving the multi-armed bandit problems with three slot machines using lag synchronization of chaos by controlling the coupling strengths among the three lasers. Furthermore, we investigate the scalability of the proposed decision-making principle by increasing the number of slot machines and lasers. This study suggests a new direction in laser network-based decision making for future photonic intelligent functions.
Efficient and accurate decision making is gaining increased importance with the rapid expansion of information communication technologies including artificial intelligence. Here, we propose and experimentally demonstrate an on-chip, integrated photonic decision maker based on a ring laser. The ring laser exhibits spontaneous switching between clockwise and counter-clockwise oscillatory dynamics; we utilize such nature to solve a multi-armed bandit problem. The spontaneous switching dynamics provides efficient exploration to find the accurate decision. On-line decision making is experimentally demonstrated including autonomous adaptation to an uncertain environment. This study paves the way for directly utilizing the fluctuating physics inherent in ring lasers, or integrated photonics technologies in general, for achieving or accelerating intelligent functionality.
Photonic accelerators have attracted increasing attention for use in artificial intelligence applications. The multi-armed bandit problem is a fundamental problem of decision making using reinforcement learning. However, to the best of our knowledge, the scalability of photonic decision making has not yet been demonstrated in experiments because of the technical difficulties in the physical realization. We propose a parallel photonic decision-making system to solve large-scale multi-armed bandit problems using optical spatiotemporal chaos. We solved a 512-armed bandit problem online, which is larger than those in previous experiments by two orders of magnitude. The scaling property for correct decision making is examined as a function of the number of slot machines, evaluated as an exponent of 0.86. This exponent is smaller than that in previous studies, indicating the superiority of the proposed parallel principle. This experimental demonstration facilitates photonic decision making to solve large-scale multi-armed bandit problems for future photonic accelerators.
Reservoir computing is a machine learning paradigm that uses a structure called a reservoir, which has nonlinearities and short-term memory. In recent years, reservoir computing has expanded to new functions such as the autonomous generation of chaotic time series, as well as time series prediction and classification. Furthermore, novel possibilities have been demonstrated, such as inferring the existence of previously unseen attractors. Sampling, in contrast, has a strong influence on such functions. Sampling is indispensable in a physical reservoir computer that uses an existing physical system as a reservoir because the use of an external digital system for the data input is usually inevitable. This study analyzes the effect of sampling on the ability of reservoir computing to autonomously regenerate chaotic time series. We found, as expected, that excessively coarse sampling degrades the system performance, but also that excessively dense sampling is unsuitable. Based on quantitative indicators that capture the local and global characteristics of attractors, we identify a suitable window of the sampling frequency and discuss its underlying mechanisms.
Collective decision making is vital for recent information and communications technologies. In our previous research, we mathematically derived conflict-free joint decision making that optimally satisfies players' probabilistic preference profiles. However, two problems exist regarding the optimal joint decision-making method. First, as the number of choices increases, the computational cost of calculating the optimal joint selection probability matrix explodes. Second, to derive the optimal joint selection probability matrix, all players must disclose their probabilistic preferences. Now, it is noteworthy that explicit calculation of the joint probability distribution is not necessarily needed; what is necessary for collective decisions is sampling. This study examines several sampling methods that converge to heuristic joint selection probability matrices that satisfy players' preferences. We show that they can significantly reduce the above problems of computational cost and confidentiality. We analyze the probability distribution each of the sampling methods converges to, as well as the computational cost required and the confidentiality secured. In particular, we introduce two conflict-free joint sampling methods through the quantum interference of photons. The first system allows the players to hide their choices while satisfying the players' preferences almost perfectly when they have the same preferences. The second system, where the physical nature of light replaces the expensive computational cost, also conceals their choices under the assumption that they have a trusted third party.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.