A review of inverse reinforcement learning theory and recent advances

Shao, Zhenfeng; Joo, Er Meng

doi:10.1109/cec.2012.6256507

Cited by 35 publications

(17 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This has the advantage that instead of requiring the developer to explicitly specify a reward function, they simply have to demonstrate the intended behaviour. This can be advantageous since in large and complex tasks, defining an adequate reward function to provide optimal agent behaviour can be both difficult and time consuming [130]. IRL approaches have been shown to not only reduce the amount of time required for design and optimisation, but also improve the system performance by creating more robust reward functions.…”

Section: Simultaneous Lateral and Longitudinal Control Systemsmentioning

confidence: 99%

“…Therefore, hand tuning of the derived reward function may be required to ensure safe behaviour. Lastly, the computational burden of IRL methods can be heavy since they often require iteratively solving reinforcement learning problems with each new reward function derived [130]. Nevertheless, in tasks where an adequately accurate reward function cannot be easily defined, IRL approaches can provide an effective solution.…”

Section: Simultaneous Lateral and Longitudinal Control Systemsmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Deep Learning Applications to Autonomous Vehicle Control

Kuutti

Bowden

Jin

et al. 2021

IEEE Trans. Intell. Transport. Syst.

403

182

View full text Add to dashboard Cite

Designing a controller for autonomous vehicles capable of providing adequate performance in all driving scenarios is challenging due to the highly complex environment and inability to test the system in the wide variety of scenarios which it may encounter after deployment. However, deep learning methods have shown great promise in not only providing excellent performance for complex and non-linear control problems, but also in generalising previously learned rules to new scenarios. For these reasons, the use of deep learning for vehicle control is becoming increasingly popular. Although important advancements have been achieved in this field, these works have not been fully summarised. This paper surveys a wide range of research works reported in the literature which aim to control a vehicle through deep learning methods. Although there exists overlap between control and perception, the focus of this paper is on vehicle control, rather than the wider perception problem which includes tasks such as semantic segmentation and object detection. The paper identifies the strengths and limitations of available deep learning methods through comparative analysis and discusses the research challenges in terms of computation, architecture selection, goal specification, generalisation, verification and validation, as well as safety. Overall, this survey brings timely and topical information to a rapidly evolving field relevant to intelligent transportation systems.

show abstract

Section: Simultaneous Lateral and Longitudinal Control Systemsmentioning

confidence: 99%

Section: Simultaneous Lateral and Longitudinal Control Systemsmentioning

confidence: 99%

A Survey of Deep Learning Applications to Autonomous Vehicle Control

Kuutti

Bowden

Jin

et al. 2021

IEEE Trans. Intell. Transport. Syst.

403

182

View full text Add to dashboard Cite

show abstract

“…A main limit in all kinds of IRL above is that the reward functions are considered to be a linear combination of features. The papers [33] and [34] proposed an extended approach by using a limited set of nonlinear rewards. [19] applies Gaussian processes, which is a kind of Non-parametric methods, to cater for potentially complex non-linear feedback functions Although in principle this extends the IRL paradigm to the entire range of non-linear reward functions, the use of kernel machines makes this method easy to require a large number of reward samples to approximate complex reward functions [35].…”

Section: Inverse Reinforcement Learningmentioning

confidence: 99%

A Semi-Markov Decision Model With Inverse Reinforcement Learning for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games

Zeng

Qin

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Recognizing the destination of a maneuvering agent is important to create intelligent AI players in Real Time Strategy (RTS) games. Among different ways of problem formulation, goal recognition can be solved as a model-based planning problem using off-the-shelf planners. However, the common problem in these frameworks is that they usually lack of the modeling of the action duration as in real-world scenarios the agent may take several steps to transit between grids. To solve this problem, a semi-Markov decision model (SMDM), which explicitly models the duration of an action, is proposed in this paper. Besides, most of the current works do not establish a behavioral model of the identified person, and there is almost no work modeling individual behavioral preference, which limits the accuracy of the recognition results. In this paper, the Inverse Reinforcement Learning (IRL) method is adopted in opponent behavior learning for the destination recognition problem. To adapt to the dynamic environment, the Maximum Entropy Inverse Reinforcement Learning (MaxEnt IRL) method is transformed by defining a Fitness index to measure the effect of weight and use the Nelder-Mead polyhedron search to find the optimal weight. In experiments, we build the game scenario in the Unreal Engine 4 environment and collect the moving trajectories from the human players in several different tasks for evaluating the performance of our methods. The results show that the recognizer using IRL can recognize the destination effectively even if the intention changes during the midway, and it performs better than other models in terms of several most frequently-used metrics. INDEX TERMS Real time strategy games, goal recognition, inverse reinforcement learning.

show abstract

“…This approach has a first stage, wherein a human begins controlling the agent. Then, in a second stage, a reward function is derived with Inverse Reinforcement Learning (IRL) (Ng and Russell 2000;Zhifei and Joo 2012) from the collected demonstrations. Finally, this reward function is used in a standard RL process.…”

Section: Background and Related Workmentioning

confidence: 99%

A fast hybrid reinforcement learning framework with human corrective feedback

2018

View full text Add to dashboard Cite

Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

show abstract

A review of inverse reinforcement learning theory and recent advances

Cited by 35 publications

References 29 publications

A Survey of Deep Learning Applications to Autonomous Vehicle Control

A Survey of Deep Learning Applications to Autonomous Vehicle Control

A Semi-Markov Decision Model With Inverse Reinforcement Learning for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games

A fast hybrid reinforcement learning framework with human corrective feedback

Contact Info

Product

Resources

About