Towards Model-Based Reinforcement Learning for Industry-Near Environments

Andersen, Per‐Arne; Goodwin, Morten; Granmo, Ole‐Christoffer

doi:10.1007/978-3-030-34885-4_3

Cited by 5 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Andersen et al [60], in their paper, proposed a new "Dreaming Variational Autoencoder" model to speed up detection of potential threats. The authors cite that often expert systems already exist in fully automated warehouses, but are not flexible enough to work in dynamic environments.…”

Section: Deep Learning Methodsmentioning

confidence: 99%

Motion Trajectory Prediction in Warehouse Management Systems: A Systematic Literature Review

Belter,

Hering,

Weichbroth

2023

Applied Sciences

View full text Add to dashboard Cite

Background: In the context of Warehouse Management Systems, knowledge related to motion trajectory prediction methods utilizing machine learning techniques seems to be scattered and fragmented. Objective: This study seeks to fill this research gap by using a systematic literature review approach. Methods: Based on the data collected from Google Scholar, a systematic literature review was performed, covering the period from 2016 to 2023. The review was driven by a protocol that comprehends inclusion and exclusion criteria to identify relevant papers. Results: Considering the Warehouse Management Systems, five categories of motion trajectory prediction methods have been identified: Deep Learning methods, probabilistic methods, methods for solving the Travelling-Salesman problem (TSP), algorithmic methods, and others. Specifically, the performed analysis also provides the research community with an overview of the state-of-the-art methods, which can further stimulate researchers and practitioners to enhance existing and develop new ones in this field.

show abstract

Section: Deep Learning Methodsmentioning

confidence: 99%

Motion Trajectory Prediction in Warehouse Management Systems: A Systematic Literature Review

Belter,

Hering,

Weichbroth

2023

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Similarly, Anderson et al [1] proposed Dreaming Variational Autoencoder, an architecture for modeling the environment using VAE and RNN, which uses the real trajectories from the actual environment to imitate the behavior of the actual environment. Conversely, Anderson et al [2] found that in high-dimensional tasks, simple heuristics exploration are often trapped in local minima of the state space, which may cause the generative model to become inaccurate or even collapse.…”

Section: Related Workmentioning

confidence: 99%

“…(1) DQN and DDQN are deep Q learning [13,14] and double deep Q learning [7], which are benchmark comparison algorithms; (2) CDQN and CDDQN are DQN or DDQN based on curiosity-driven exploration [5,15]; (3) DQN-VAE and DDQN-VAE add a VAE structure to DQN or DDQN, which only uses the VAE model to alleviate insufficient sample diversity; (4) DQN-CVAE and DDQN-CVAE are our proposed algorithms that combine (2) and (3). It was different from (3) in that we use curiosity-driven exploration to improve the efficiency of exploration.…”

Section: Evaluation Criteria and Comparison Algorithmsmentioning

confidence: 99%

“…When the generative model is sufficiently trained, the DRL algorithm can be trained without interacts with the actual environment. In [1,2,6,8], it is confirmed that the agent can learn the optimal policy only use generate training samples. However, these generative models may become inaccurate and even collapse where the state-action pair insufficient explored [1,2,6].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Curiosity-Driven Variational Autoencoder for Deep Q Network

Han

Zhang

Wang

et al. 2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

In recent years, deep reinforcement learning (DRL) has achieved tremendous success in high-dimensional and large-scale space control and sequential decision-making tasks. However, the current model-free DRL methods suffer from low sample efficiency, which is a bottleneck that limits their performance. To alleviate this problem, some researchers used the generative model for modeling the environment. But the generative model may become inaccurate or even collapse if the state has not been sufficiently explored. In this paper, we introduce a model called Curiosity-driven Variational Autoencoder (CVAE), which combines variational autoencoder and curiosity-driven exploration. During the training process, the CVAE model can improve sample efficiency while curiosity-driven exploration can make sufficient exploration in a complex environment. Then, a CVAE-based algorithm is proposed, namely DQN-CVAE, that scales CVAE to higher dimensional environments. Finally, the performance of our algorithm is evaluated through several Atari 2600 games, and the experimental results show that the DQN-CVAE achieves better performance in terms of average reward per episode on these games.

show abstract