Tomoki Hamagami scite author profile

Hirata

A n intelligent wheelchair (IWC) prototype system: AC&% is developed to aid indoor safe mobility for physically challenged people. ACEMtb, ,as an agent, can acquire autonomous, cooperative, and collaborative behavior. The autonomous behavior realizes safe and effective moves with observing local real environments. The cooperative behavior emerges fmrn interactions within other AGO&& dynamically. The collaborative behavior aims to assist user operations, and provides functions for connecting to various ubiquitous devices. These behaviors are acquired with learning and evolution of intelligent ACCMtb agents through th,eir experience in the real or virtual environments. Through experiments in real world environments, it is Ahown that the agent can acquire these intelligent behaviors on A m l o .

Method of crowd simulation by using multiagent on cellular automata

Hirata

This paper presents a new simulation method of crowd behavior, method that uses a two-layer model that consists of multiagent (MA) framework and cellular automata (CA). The features of this method are as follows. (1) Complicated crowd behavior emerges from the autonomous actions of agents. (2) Separating an autonomous action process from a restriction of physical interferences. Using a simulation system implementing the two-layer model, crowd behavior simulations are realized. In particular, collision cases of counter crowds are analyzed in detail, and interesting results are found. (1) Homogeneous agent crowd tends to make whirlpools, waves, and blanks, and to be slow at the walking. (2) Heterogeneous agent crowd formed lines, then flows efficiently. Experimental results show that combining MA and CA is effective to easy realize the complicated crowd behavior on various environments.

Complex-Valued Reinforcement Learning

Shibuya

Shimada

2006

A new reinforcement learning algorithm with complex-valued functions is proposed. The algorithm is inspired by complex-valued neural networks introducing complex numbers representing phase and amplitude into a conventional neural network. The strong advantage of using complex values in reinforcement learning is that the state-action function in a time series can be easily extended. In particular, considering the coherence of each complex value, the proposed learning algorithm can represent the context of agent behavior. This extension allows compensating for the perceptual aliasing problem and provides for the intelligent behavior of mobile robots in the real world. The complex-valued functions are applied to the conventional reinforcement learning algorithms: Q-learning and profit sharing. These algorithms are evaluated by simple maze problems and a bar-carrying task involving perceptual aliasings. Simulation experiments show that the new algorithm can efficiently solve perceptual aliasing.

An adjustment method of the number of states on Q‐learning segmenting state space adaptively

Electron Comm Jpn Pt II

Koakutsu

Hirata

2007

SUMMARYThe results of imposing limitations on the number of states and of promoting the splitting of states in Q-learning are presented. Q-learning is a common reinforcement learning method in which the learning agent autonomously segments the environment states. In situations where the designer of an agent is unable to explicitly provide the agent with the boundaries of states in the environment in which the agent is acting, the agent needs to simultaneously learn while autonomously determining the internal discrete states that are needed in order to take the appropriate actions. A simple method of segmenting states based on a reinforcement signal (QLASS) has been proposed for this purpose. However, the original method suffers from the problem that the number of states grows excessively large as learning proceeds. A method is therefore proposed that defines temperature and eligibility attributes for each of the internal discrete states of the agent, and that limits and adds to the number of internal discrete states, and promotes random actions depending on the values of these attributes. The results of applying the proposed method to a number of tasks, including tasks that incorporate a dynamic environment, are compared to the QLASS method when only the reinforcement signal is used, and a similar level of learning results is found to be achieved using a fewer number of states. Furthermore, it is found that tasks are able to be completed in a small number of steps even when only a small number of trials are used for learning.

Theoretical XCS parameter settings of learning accurate classifiers

Nakata

Browne

et al. 2017