2022
DOI: 10.3390/drones6110365
|View full text |Cite
|
Sign up to set email alerts
|

Improved Dyna-Q: A Reinforcement Learning Method Focused via Heuristic Graph for AGV Path Planning in Dynamic Environments

Abstract: Dyna-Q is a reinforcement learning method widely used in AGV path planning. However, in large complex dynamic environments, due to the sparse reward function of Dyna-Q and the large searching space, this method has the problems of low search efficiency, slow convergence speed, and even inability to converge, which seriously reduces the performance and practicability of it. To solve these problems, this paper proposes an Improved Dyna-Q algorithm for AGV path planning in large complex dynamic environments. Firs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 30 publications
0
1
0
Order By: Relevance
“…While Dyna-Q is often more efficient than Q-Learning when it comes to trajectory planning problems, it is also recognized for its low search efficiency, slow convergence speed, and, in some cases, inability to converge in complex dynamic environments. These issues arise due to the sparse reward function and the vast search space involved [46]. Although the Taxi task was not considered a priori as a complex scenario, it is true that the positive reward is quite sparse, since it is only obtained when the agent succeeds in releasing the passenger at the destination.…”
Section: Policy Learningmentioning
confidence: 99%
“…While Dyna-Q is often more efficient than Q-Learning when it comes to trajectory planning problems, it is also recognized for its low search efficiency, slow convergence speed, and, in some cases, inability to converge in complex dynamic environments. These issues arise due to the sparse reward function and the vast search space involved [46]. Although the Taxi task was not considered a priori as a complex scenario, it is true that the positive reward is quite sparse, since it is only obtained when the agent succeeds in releasing the passenger at the destination.…”
Section: Policy Learningmentioning
confidence: 99%
“…Experimental validation indicates the effectiveness of this method in reducing the waiting time for CNC machines and enhancing overall production efficiency. Liu et al [24] applied an improved reinforcement learning method to AGV path planning. They designed a new dynamic reward function and action selection method, ultimately demonstrating through experimental cases that the method can effectively assist AGVs in obtaining better paths.…”
Section: Introductionmentioning
confidence: 99%
“…AGVs have advantages such as their high work efficiency, low cost, and high safety. The research on AGVs includes navigation algorithms [9,10], path planning algorithms [11,12], path tracking algorithms [13,14], and vehicle scheduling algorithms [15,16]. Of these, path planning determines the route by which the AGV will move, and plays a fundamental role in determining the operating costs.…”
Section: Introductionmentioning
confidence: 99%