2019
DOI: 10.1016/j.neucom.2018.11.090
|View full text |Cite
|
Sign up to set email alerts
|

Obtaining fault tolerance avoidance behavior using deep reinforcement learning

Abstract: In this article, a mapless movement policy for mobile agents, designed specifically to be faulttolerant, is presented. The provided policy, which is learned using deep reinforcement learning, has advantages compared to the usual mapless policies: this policy is capable of handling a robot even when some of its sensors are broken. It is an end-to-end policy based on three neuronal models capable not only of moving the robot and maximizing the coverage of the environment but also of learning the best movement be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 28 publications
0
7
0
Order By: Relevance
“…When faced with different hardware conditions, the system may fail. Aznar et al [41] designed a navigation policy specifically for fault tolerance, whereby the proposed system can continue to work normally under sensor failure conditions and shows several advantages in its robustness, scalability, and practicality. Choi et al [42] studied the limited Field Of View (FOV) problem.…”
Section: Sensor Robustnessmentioning
confidence: 99%
“…When faced with different hardware conditions, the system may fail. Aznar et al [41] designed a navigation policy specifically for fault tolerance, whereby the proposed system can continue to work normally under sensor failure conditions and shows several advantages in its robustness, scalability, and practicality. Choi et al [42] studied the limited Field Of View (FOV) problem.…”
Section: Sensor Robustnessmentioning
confidence: 99%
“…The penalty area of QL is to absorb a policy, which expresses an agent pardons action to take under what surroundings that does not even necessitate a model of the environment and it can grip difficulties with stochastic transitions and plunders, deprived from necessitating adaptations [20], [120]. [23] IoT representation annotation [24] Data-driven management [25] Data and Feedback validation [26] Visualization and understanding [27] Learning environment detection [28] Fraud detection [29] Prediction of the performance [50] Classification of capability [51] Tolerance related acquisition [52] IoT crime forensics [53] Fraud detection in IoT application [54] IoT decision process and making [55] LA Intrusion prediction [30] IoT representation annotation [31] Data-driven management [32] Data and Feedback validation [33] Visualization and understanding [34] Learning environment detection [35] Fraud detection [36] Predicting Software Defects on IoTs [56] Prediction of behavioral changes [57] Signature verification [58] Analysis and decisions [59] Auto-selection of IoT task [60] Traffic incident detection [61] Telecommunication [62] Internet networks [63] MDP Intrusion prediction [37] IoT representation annotation [38] Data-driven management [39] Data and Feedback validation [40] Visualization and understanding [41] Learning environment detection [42] Fraud detection [43] Re...…”
Section: Q-learningmentioning
confidence: 99%
“…A recent trend in machine learning has been end-to-end learning, which condenses multiple stages of processing for a given task into a single deep neural network (see, e.g., [16], [17]). A similar idea was applied to FTC in [18], in which a model was trained using reinforcement learning to directly handle the fault of ultrasound sensors of a mobile robot in a kinematic obstacle avoidance problem. Even if the approach developed in [18] consisted of a sequence of deep neural networks with sensor measurements as input and robot action as output, the robot state was explicitly estimated as an intermediate variable.…”
Section: Introductionmentioning
confidence: 99%
“…The stages of FDI and control are replaced with a single recurrent neural network (RNN) with sensor measurements as input and control variables as output, in order to obtain a faster design process compared to classical methods. In contrast to [18], our deep FTC (DFTC) method has no explicit representation of the observed system states, and its training is based on supervised learning, rather than reinforcement learning. DFTC only requires (i) the availability of a (non-fault-tolerant) full state feedback control law, which is used as an ideal reference during the training phase, and (ii) the observability of the state vector using only the available non-faulty sensors, for all considered sensor faults.…”
Section: Introductionmentioning
confidence: 99%