Generative adversarial imitation learning (GAIL) has attracted increasing attention in the field of robot learning. It enables robots to learn a policy to achieve a task demonstrated by an expert while simultaneously estimating the reward function behind the expert's behaviors. However, this framework is limited to learning a single task with a single reward function. This study proposes an extended framework called situated GAIL (S-GAIL), in which a task variable is introduced to both the discriminator and generator of the GAIL framework. The task variable has the roles of discriminating different contexts and making the framework learn different reward functions and policies for multiple tasks. To achieve the early convergence of learning and robustness during reward estimation, we introduce a term to adjust the entropy regularization coefficient in the generator's objective function. Our experiments using two setups (navigation in a discrete grid world and arm reaching in a continuous space) demonstrate that the proposed framework can acquire multiple reward functions and policies more effectively than existing frameworks. The task variable enables our framework to differentiate contexts while sharing common knowledge among multiple tasks.
This article is concerned with the behavior of drivers of heavy-duty vehicles in following the preceding cars. It has been analyzed using a driver model. We first focus on a way for drivers to obtain information so that they can longitudinally control their vehicles. The model is used to describe the two kinds of control methods for drivers; the first is feed-forward, and the second is feed-back. Using multiple regression analysis, the feed-forward model was constructed taking the time-delay and the cut-off frequency of information into consideration. A feed-back model for information that cannot be described by feed-forward control was also constructed. Using a combination of the feed-forward and the feed-back model, a fore-and-aft control model was constructed, and we found that the driver model could provide a good description of the actual driver behavior. Finally, major factors involved in getting control information to the driver are clarified using factor analysis.
Ryo IWAKI †a) , Nonmember, Hiroki YOKOYAMA † †b) , and Minoru ASADA †c) , Members
SUMMARYThe step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation. key words: reinforcement learning, natural policy gradient, adaptive step size
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.