Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Teigen, Halvor Ødegård; Laache, Torkel; Varagnolo, Damiano; Rasheed, Adil

doi:10.3389/frobt.2021.738113

Cited by 20 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We prefer DDPG over TRPO because DDPG is computationally less expensive than TRPO, making it a better choice for problems with a large state or action space or where data collection is expensive. DDPG is generally more sampleefficient than TRPO and has been shown to converge faster than TRPO in some cases, making it a better choice for problems where the agent needs to learn quickly [36].…”

Section: Reinforcement Learning For Aerial-irs Trajectory Optimizatio...mentioning

confidence: 99%

Reinforcement Learning for Resilient Aerial-IRS Assisted Wireless Communications Networks in the Presence of Multiple Jammers

Tariq,

Baccour,

Erbad

et al. 2024

IEEE Open J. Commun. Soc.

View full text Add to dashboard Cite

The evolving landscape of beyond 5G and 6G wireless communication systems in smart urban environments faces numerous interference-related challenges posed by legitimate and illicit devices. In this context, Intelligent Reflecting Surfaces (IRS) have emerged as a promising solution to mitigate interference caused by obstacles and unknown jamming devices. Existing techniques mainly focus on mitigating the impact of a single jammer in IRS-assisted communications systems, which affects both stationary and mobile devices. Additionally, these approaches target a single objective, such as minimizing the energy, enhancing the transmission rate, or maximizing the Signal-to-Interference-plus-Noise Ratio (SINR), which restrains the performance of the system. This paper offers a comprehensive anti-jamming solution for securing wireless communications in a smart city urban environment comprising diverse public events such as sporting events, parades, festivals, and exhibitions. The focus is on maintaining essential services like security, law enforcement, logistics, emergency response, crowd management, and public health. We introduce a Reinforcement Learning-based technique for UAV-mounted IRS, optimizing trajectory and phase shift beamforming to counteract the disruptive impact of jammers, ensuring reliable communication in dynamic, security-sensitive settings Our approach also seeks to achieve multi-objective optimization by striking a balance between transmission rate and energy consumption in this highly challenging environment. The formulated optimization is computationally complex due to its combinatorial nature. Hence, we leverage the light-weight Deep Reinforcement Learning (DRL) technique called Deep Deterministic Policy Gradient (DDPG) to optimize trajectory and IRS phase shifts and achieve multiple objectives jointly. Experimental results demonstrate the effectiveness of our proposed DDPGbased approach in outperforming other RL algorithms. It achieves a near-optimal solution compared to the benchmark technique within the close gap and improves both achievable transmission rates and energy efficiency compared to related works by 50-70%.

show abstract

Section: Reinforcement Learning For Aerial-irs Trajectory Optimizatio...mentioning

confidence: 99%

Reinforcement Learning for Resilient Aerial-IRS Assisted Wireless Communications Networks in the Presence of Multiple Jammers

Tariq,

Baccour,

Erbad

et al. 2024

IEEE Open J. Commun. Soc.

View full text Add to dashboard Cite

show abstract

“…There are many model-free DRL frameworks developed in the past decade [36]- [40]. The key features of each framework include whether it is value optimization or policy optimization-based;…”

Section: Deep Reinforcement Learning Algorithmmentioning

confidence: 99%

“…3) On-Policy vs. Off-Policy in DRL: Policy-based DRL algorithms with stochastic policy can be further categorized into on-policy and off-policy learning methods [39]. SAC adopts an offpolicy learning method, while TRPO and PPO are on-policy ones [40]. An off-policy algorithm learns the optimal policy (approximated by a target NN) that is different from the behavior policy (approximated by a behavior NN) for generating new experiences during training.…”

Section: Deep Reinforcement Learning Algorithmmentioning

confidence: 99%

DRL-based Resource Allocation in Remote State Estimation

Pang¹,

Liu²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Remote state estimation, where sensors send their measurements of distributed dynamic plants to a remote estimator over shared wireless resources, is essential for mission-critical applications of Industry 4.0. Existing algorithms on dynamic radio resource allocation for remote estimation systems assumed oversimplified wireless communications models and can only work for small-scale settings. In this work, we consider remote estimation systems with practical wireless models over the orthogonal multipleaccess and non-orthogonal multiple-access schemes. We derive necessary and sufficient conditions under which remote estimation systems can be stabilized. The conditions are described in terms of the transmission power budget, channel statistics, and plants' parameters. For each multiple-access scheme, we formulate a novel dynamic resource allocation problem as a decision-making problem for achieving the minimum overall long-term average estimation mean-square error. Both the estimation quality and the channel quality states are taken into account for decision making. We systematically investigated the problems under different multiple-access schemes with large discrete, hybrid discrete-and-continuous, and continuous action spaces, respectively. We propose novel action-space compression methods and develop advanced deep reinforcement learning algorithms to solve the problems. Numerical results show that our algorithms solve the resource allocation problems effectively and provide much better scalability than the literature.

show abstract

“…Additionally, Ryohei Sawada applied PPO in combination with LSTM neural networks to achieve autonomous ship collision avoidance in continuous action spaces [30]. Most recently, in 2021, Thomas Nakken Larsen compared the effectiveness of various DRL algorithms for safe navigation in challenging waterways [31].…”

Section: Introductionmentioning

confidence: 99%

Optimizing Multi-Vessel Collision Avoidance Decision Making for Autonomous Surface Vessels: A COLREGs-Compliant Deep Reinforcement Learning Approach

Xie,

Gang,

Zhang

et al. 2024

JMSE

View full text Add to dashboard Cite

Automatic collision avoidance decision making for vessels is a critical challenge in the development of autonomous ships and has become a central point of research in the maritime safety domain. Effective and systematic collision avoidance strategies significantly reduce the risk of vessel collisions, ensuring safe navigation. This study develops a multi-vessel automatic collision avoidance decision-making method based on deep reinforcement learning (DRL) and establishes a vessel behavior decision model. When designing the reward function for continuous action spaces, the criteria of the “Convention on the International Regulations for Preventing Collisions at Sea” (COLREGs) were adhered to, taking into account the vessel’s collision risk under various encounter situations, real-world navigation practices, and navigational complexities. Furthermore, to enable the algorithm to precisely differentiate between collision avoidance and the navigation resumption phase in varied vessel encounter situations, this paper incorporated “collision avoidance decision making” and “course recovery decision making” as state parameters in the state set design, from which the respective objective functions were defined. To further enhance the algorithm’s performance, techniques such as behavior cloning, residual networks, and CPU-GPU dual-core parallel processing modules were integrated. Through simulation experiments in the enhanced Imazu training environment, the practicality of the method, taking into account the effects of wind and ocean currents, was corroborated. The results demonstrate that the proposed algorithm can perform effective collision avoidance decision making in a range of vessel encounter situations, indicating its efficiency and robust generalization capabilities.

show abstract

Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Cited by 20 publications

References 25 publications

Reinforcement Learning for Resilient Aerial-IRS Assisted Wireless Communications Networks in the Presence of Multiple Jammers

Reinforcement Learning for Resilient Aerial-IRS Assisted Wireless Communications Networks in the Presence of Multiple Jammers

DRL-based Resource Allocation in Remote State Estimation

Optimizing Multi-Vessel Collision Avoidance Decision Making for Autonomous Surface Vessels: A COLREGs-Compliant Deep Reinforcement Learning Approach

Contact Info

Product

Resources

About