2019
DOI: 10.3390/sym11020290
|View full text |Cite
|
Sign up to set email alerts
|

Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms

Abstract: In this paper, we propose a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. We use a reward function and a deep neural network to build the controller. By using the proposed controller, a bicycle can not only be stably balanced but also travel to any specified location. We confirm that the controller with DDPG shows better performance than the other baselines such as Normalized Advantage Function (NAF) an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…If our agent defeats the opponents with a better probability, the output of the agent is required to be more stable in the robot confrontation system for micromanagement. The classical Q-learning method solves the dilemma of exploration and exploitation using the ε-greedy algorithm, which makes the agent have a certain probability to explore new actions [28]. However, the probability of each action being selected is the same when the ε-greedy algorithm is used, so the action that can produce better rewards is not easy to choose.…”
Section: An Improved Q-learning Methods In Semi-markov Decision Processesmentioning
confidence: 99%
“…If our agent defeats the opponents with a better probability, the output of the agent is required to be more stable in the robot confrontation system for micromanagement. The classical Q-learning method solves the dilemma of exploration and exploitation using the ε-greedy algorithm, which makes the agent have a certain probability to explore new actions [28]. However, the probability of each action being selected is the same when the ε-greedy algorithm is used, so the action that can produce better rewards is not easy to choose.…”
Section: An Improved Q-learning Methods In Semi-markov Decision Processesmentioning
confidence: 99%
“…6. The decision-maker is configured as a neural network [82][83][84][85] that can make decisions on the SD compliance levels (within the limit SD max ), given the decision-maker's observation of the environment. The environment comprises the ABM which simulates effects of these decisions on the transmission and control of the COVID-19 within a typical Australian town, as described in previous section.…”
Section: Agent-based Model For Covid-19 Transmission and Controlmentioning
confidence: 99%
“…In recent years, reinforcement learning has been widely used in the field of robot control. Choi et al (2019) took the lead in using DDPG to realize the balance control of STTW robot. However, this paper did not consider the task of robot passing through unstructured terrain, which will be a focus of our work.…”
Section: Preliminariesmentioning
confidence: 99%