“…Furthermore, a noisy network [22] with factorized Gaussian noise injected into the parameters of the online network is introduced to solve the exploration limitation in the DDQN, which can automatically adjust the randomness of action selection to find a better balance between exploration and exploitation. Additionally, an n-step temporal difference learning mechanism is adopted to alleviate the estimation error of the target Q-value, moving much closer to the real target Q-value [23], coupled with a dueling network structure to further alleviate overestimation issues and improve the stability and accuracy of learning by decomposing the Q-value into a state-value function and an advantage function, allowing the network to better learn the relationship between the state value and action advantage [24,25].…”
Effective real-time autonomous navigation for mobile robots in static and dynamic environments has become a challenging and active research topic. Although the simultaneous localization and mapping (SLAM) algorithm offers a solution, it often heavily relies on complex global and local maps, resulting in significant computational demands, slower convergence rates, and prolonged training times. In response to these challenges, this paper presents a novel algorithm called PER-n2D3QN, which integrates prioritized experience replay, a noisy network with factorized Gaussian noise, n-step learning, and a dueling structure into a double deep Q-network. This combination enhances the efficiency of experience replay, facilitates exploration, and provides more accurate Q-value estimates, thereby significantly improving the performance of autonomous navigation for mobile robots. To further bolster the stability and robustness, meaningful improvements, such as target “soft” updates and the gradient clipping mechanism, are employed. Additionally, a novel and powerful target-oriented reshaping reward function is designed to expedite learning. The proposed model is validated through extensive experiments using the robot operating system (ROS) and Gazebo simulation environment. Furthermore, to more specifically reflect the complexity of the simulation environment, this paper presents a quantitative analysis of the simulation environment. The experimental results demonstrate that PER-n2D3QN exhibits heightened accuracy, accelerated convergence rates, and enhanced robustness in both static and dynamic scenarios.
“…Furthermore, a noisy network [22] with factorized Gaussian noise injected into the parameters of the online network is introduced to solve the exploration limitation in the DDQN, which can automatically adjust the randomness of action selection to find a better balance between exploration and exploitation. Additionally, an n-step temporal difference learning mechanism is adopted to alleviate the estimation error of the target Q-value, moving much closer to the real target Q-value [23], coupled with a dueling network structure to further alleviate overestimation issues and improve the stability and accuracy of learning by decomposing the Q-value into a state-value function and an advantage function, allowing the network to better learn the relationship between the state value and action advantage [24,25].…”
Effective real-time autonomous navigation for mobile robots in static and dynamic environments has become a challenging and active research topic. Although the simultaneous localization and mapping (SLAM) algorithm offers a solution, it often heavily relies on complex global and local maps, resulting in significant computational demands, slower convergence rates, and prolonged training times. In response to these challenges, this paper presents a novel algorithm called PER-n2D3QN, which integrates prioritized experience replay, a noisy network with factorized Gaussian noise, n-step learning, and a dueling structure into a double deep Q-network. This combination enhances the efficiency of experience replay, facilitates exploration, and provides more accurate Q-value estimates, thereby significantly improving the performance of autonomous navigation for mobile robots. To further bolster the stability and robustness, meaningful improvements, such as target “soft” updates and the gradient clipping mechanism, are employed. Additionally, a novel and powerful target-oriented reshaping reward function is designed to expedite learning. The proposed model is validated through extensive experiments using the robot operating system (ROS) and Gazebo simulation environment. Furthermore, to more specifically reflect the complexity of the simulation environment, this paper presents a quantitative analysis of the simulation environment. The experimental results demonstrate that PER-n2D3QN exhibits heightened accuracy, accelerated convergence rates, and enhanced robustness in both static and dynamic scenarios.
Purpose: The field of autonomous mobile robots (AMRs) has experienced significant growth in recent years, propelled by advancements in autonomous driving and unmanned aerial vehicles (UAVs). The integration of intelligence into robotic systems necessitates addressing various research challenges, with naviga- tion emerging as a pivotal aspect of mobile robotics. This paper explores the three fundamental questions central to the navigation problem: localization (determin- ing the robot’s position), mapping (creating a representation of the environment), and path planning (determining the optimal route to the destination). The pro- posed solution to the mobile robot navigation problem involves the seamless integration of these three foundational navigation components.
Methods: Our comparative analysis between the Q-learning modified method and a deep Q-network (DQN) in simulated gym pathfinding tasks reveals the efficacy of this approach. The modified Q-learning algorithm consistently outperforms DQN, demonstrating its superior ability to navigate complex environments and achieve optimal solutions. The transition from a definite environment to a simulated gym environment serves as a valuable validation of the method’s applicability in real-world scenarios. By rigorously evaluating our algorithm in a controlled setting, we can ensure its robustness and effectiveness across a broader range of applications.
Results: In essence, our study establishes the modified Q-learning algorithm as a promising new approach to addressing the exploration-exploitation dilemma in reinforcement learning. Its superior performance in simulated gym environments suggests its potential for real-world applications in various domains, including robotics, autonomous navigation, and game development.
Conclusion: The paper furnishes a comprehensive overview of research on autonomous mobile robot navigation. It begins with a succinct introduction to the diverse facets of navigation, followed by an examination of the roles of machine learning and reinforcement learning in the realm of mobile robotics. Subsequently, the paper delves into various path planning techniques. In the end, this paper presents a comparative analysis of two path planning methods for mobile robots: Q-learning with an enhanced exploration strategy and Deep Q-Network (DQN). Through a comprehensive simulation study in a gym environment, the superior performance of the proposed Q-learning approach is firmly established.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.