Daniel Kudenko scite author profile

Reinforcement learning, while being a highly popular learning technique for agents and multi-agent systems, has so far encountered difficulties when applying it to more complex domains due to scaling-up problems. This paper focuses on the use of domain knowledge to improve the convergence speed and optimality of various RL techniques. Specifically, we propose the use of high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We show that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPS-based method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluate the robustness of the proposed STRIPS-based technique to errors in the plan knowledge.

show abstract

Predicting player disengagement and first purchase with event-frequency based data representation

Xie

Devlin

Kudenko

et al. 2015

View full text Add to dashboard Cite

From value chains to technological platforms: The effects of crowdfunding in the digital game industry

Nucciarelli

Feng

Fernandes

et al. 2017

Journal of Business Research

View full text Add to dashboard Cite

The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that: • a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.

show abstract

Learning in multi-agent systems

Alonso

d’Inverno

Kudenko

et al. 2001

The Knowledge Engineering Review

View full text Add to dashboard Cite

In recent years, multi-agent systems (MASs) have received increasing attention in the artificial intelligence community. Research in multi-agent systems involves the investigation of autonomous, rational and flexible behaviour of entities such as software programs or robots, and their interaction and coordination in such diverse areas as robotics (Kitano et al., 1997), information retrieval and management (Klusch, 1999), and simulation (Gilbert & Conte, 1995). When designing agent systems, it is impossible to foresee all the potential situations an agent may encounter and specify an agent behaviour optimally in advance. Agents therefore have to learn from, and adapt to, their environment, especially in a multi-agent setting.

show abstract

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

Devlin

Kudenko

Grzes

2011

Advs. Complex Syst.

View full text Add to dashboard Cite

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.

show abstract

Online learning of shaping rewards in reinforcement learning

Grze¹,

Kudenko²

2010

Neural Networks

View full text Add to dashboard Cite

a b s t r a c tPotential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated.

show abstract

Distributed response to network intrusions using multiagent reinforcement learning

Malialis

Kudenko

2015

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel Kudenko

Reinforcement Learning of Coordination in Heterogeneous Cooperative Multi-agent Systems

Plan-based reward shaping for reinforcement learning

Predicting player disengagement and first purchase with event-frequency based data representation

From value chains to technological platforms: The effects of crowdfunding in the digital game industry

Learning in multi-agent systems

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

Online learning of shaping rewards in reinforcement learning

Distributed response to network intrusions using multiagent reinforcement learning

Contact Info

Product

Resources

About