Optimal path planning method based on epsilon-greedy Q-learning algorithm

Bulut, Vahide

doi:10.1007/s40430-022-03399-w

Cited by 13 publications

(7 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An e-greedy algorithm was used to define if an exploitative or exploratory a was taken. 53 A linear e-decay was introduced to reduce convergence time, with e starting at 1 and ending at 0.001. 54 Furthermore, the processing is split between three agents, each agent (i) is responsible for finding the optimal policy across three a instead of 27.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A haptic guidance system for simulated catheter navigation with different kinaesthetic feedback profiles

Abbasi‐Hashemi,

Janabi‐Sharifi,

Cheema

et al. 2024

Robotics Computer Surgery

View full text Add to dashboard Cite

BackgroundThis paper proposes a haptic guidance system to improve catheter navigation within a simulated environment.MethodsThree force profiles were constructed to evaluate the system: collision prevention; centreline navigation; and a novel force profile of reinforcement learning (RL). All force profiles were evaluated from the left common iliac to the right atrium.ResultsOur findings show that providing haptic feedback improved surgical safety compared to visual‐only feedback. If staying inside the vasculature is the priority, RL provides the safest option. It is also shown that the performance of each force profile varies in different anatomical regions.ConclusionThe implications of these findings are significant, as they hold the potential to improve how and when haptic feedback is applied for cardiovascular intervention.

show abstract

Section: Methodsmentioning

confidence: 99%

“…An ϵ ‐greedy algorithm was used to define if an exploitative or exploratory a was taken 53 . A linear ϵ ‐decay was introduced to reduce convergence time, with ϵ starting at 1 and ending at 0.001 54 .…”

Section: Methodsmentioning

confidence: 99%

A haptic guidance system for simulated catheter navigation with different kinaesthetic feedback profiles

Abbasi‐Hashemi,

Janabi‐Sharifi,

Cheema

et al. 2024

Robotics Computer Surgery

View full text Add to dashboard Cite

show abstract

“…The proposed algorithms (Algorithm 2 and Algorithm 3) have some initialization parameters, such as α, which represents the learning rate to moderate the speed of learning and the update of Q-values we assume α = 0.5), and γ, which represents the discount factor that quantifies the importance given to future rewards (in our approach we consider that future task placement are important thus we attribute an enough great value to γ = 0.9). To choose an action (i.e., for the placement or scheduling), Q-learning uses an ϵ-greedy policy [11]. ϵ-greedy policy is an efficient random approach that selects with a probability ϵ a random action and with a probability (1-ϵ) the action with the highest estimated reward Q(S, a).…”

Section: Reward (R)mentioning

confidence: 99%

PSRL: A New Method for Real-Time Task Placement and Scheduling Using Reinforcement Learning

Haouari,

Mzid,

Mosbahi

2023

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Modern real-time system development methodologies describe a stage in which application tasks are deployed onto an execution platform. The deployment process is divided into two steps: (i) task placement on processors and (ii) task scheduling to determine their execution order. The overall performance of the deployment model depends on the two steps, which are interdependent. In this paper, a new method based on reinforcement learning techniques, called PSRL, is proposed. PSRL explores all the feasible placements in the first step. In the second step, an optimal schedule is considered for each feasible placement. PSRL generates the optimal deployment, which corresponds to the placement and scheduling that minimize task response times. Application to case studies shows the applicability and quality of the obtained solutions when compared to related work.

show abstract

“…Bulut proposed an improved epsilon-greedy Q-learning (IEGQL) algorithm to enhance efficiency and productivity regarding path length and computational cost [18]. The IEGQL presents a reward function that ensures the environment's knowledge in advance for a mobile robot, and mathematical modeling is presented to provide the optimal selection besides ensuring a rapid convergence.…”

Section: Related Workmentioning

confidence: 99%

Mobile Robot Path Planning Using a QAPF Learning Algorithm for Known and Unknown Environments

et al. 2022

View full text Add to dashboard Cite

This paper presents the computation of feasible paths for mobile robots in known and unknown environments using a QAPF learning algorithm. Q-learning is a reinforcement learning algorithm that has increased in popularity in mobile robot path planning in recent times, due to its self-learning capability without requiring a priori model of the environment. However, Q-learning shows slow convergence to the optimal solution, notwithstanding such an advantage. To address this limitation, the concept of partially guided Q-learning is employed wherein, the artificial potential field (APF) method is utilized to improve the classical Q-learning approach. Therefore, the proposed QAPF learning algorithm for path planning can enhance learning speed and improve final performance using the combination of Q-learning and the APF method. Criteria used to measure planning effectiveness include path length, path smoothness, and learning time. Experiments demonstrate that the QAPF algorithm successfully achieves better learning values that outperform the classical Q-learning approach in all the test environments presented in terms of the criteria mentioned above in offline and online path planning modes. The QAPF learning algorithm reached an improvement of 18.83% in path length for the online mode, an improvement of 169.75% in path smoothness for the offline mode, and an improvement of 74.84% in training time over the classical approach.INDEX TERMS Path planning, Q-learning, Artificial potential field, Reinforcement learning, Mobile robots.

show abstract

Optimal path planning method based on epsilon-greedy Q-learning algorithm

Cited by 13 publications

References 27 publications

A haptic guidance system for simulated catheter navigation with different kinaesthetic feedback profiles

A haptic guidance system for simulated catheter navigation with different kinaesthetic feedback profiles

PSRL: A New Method for Real-Time Task Placement and Scheduling Using Reinforcement Learning

Mobile Robot Path Planning Using a QAPF Learning Algorithm for Known and Unknown Environments

Contact Info

Product

Resources

About