2020
DOI: 10.1002/asjc.2336
|View full text |Cite
|
Sign up to set email alerts
|

Q‐learning for noise covariance adaptation in extended KALMAN filter

Abstract: The extended Kalman filter (EKF) is a widely used method in navigation applications. The EKF suffers from noise covariance uncertainty, potentially causing it to perform poorly in practice. This paper attempts to suppress the unfavorable effect of the noise covariance uncertainty to the EKF in the framework of reinforcement learning. The proposed state estimation algorithm combines the EKF and a Q-learning method, where a covariance adaptation strategy is designed based on the Q-values, leading to a gradual im… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(12 citation statements)
references
References 40 publications
0
11
0
Order By: Relevance
“…Each time after executing an action a, the agent receives a response from the environment, which is translated to a reward (R) showing how good the action is. Significantly, Q-learning at its core seeks to maximize the cumulative reward by performing the best action at each state [12]. The cumulative reward is stored as Q-value through the Q-learning update rule as (11) where Q(s, a) ∈ R is the Q-value for the action a in state s, R ∈ R is the reward gained by exeuting action a in state s, α is the learning rate, and γ is the discount factor.…”
Section: Preliminaries On Q-learning Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…Each time after executing an action a, the agent receives a response from the environment, which is translated to a reward (R) showing how good the action is. Significantly, Q-learning at its core seeks to maximize the cumulative reward by performing the best action at each state [12]. The cumulative reward is stored as Q-value through the Q-learning update rule as (11) where Q(s, a) ∈ R is the Q-value for the action a in state s, R ∈ R is the reward gained by exeuting action a in state s, α is the learning rate, and γ is the discount factor.…”
Section: Preliminaries On Q-learning Approachmentioning
confidence: 99%
“…Recent advancements in Reinforcement Learning (RL) have made it appealing to be implemented to cope with uncertain environments. Specifically, this work is motivated by the strength of the Q-learning method [9]- [12] in which an intelligent agent learns how to take action in an environment with uncertain parameters. To the best of our knowledge, the process and measurement noise covariance matrices adaptation for the EKF based on Q-learning has not been solved yet within the scope of attitude and related states estimation by MARG sensors.…”
Section: Introductionmentioning
confidence: 99%
“…The first group is the methods that tuning the parameters of KF such as process noise covariance matrix 𝑄 , measurement noise covariance matrix 𝑅 , weighting factors, etc., where those parameters are predicted by AI techniques [28]- [44]. Especially, many studies in this group are focused on tuning noise covariance matrices.…”
Section: Tuning Parameters Of Kfmentioning
confidence: 99%
“…Second, it is sufficient for Q-learning to explore and exploit all directions from the center element. Third, its all 9 elements can be visited whit a shorter learning period, compared to larger size grids in [8], [10]. Fourth, by dynamically updating the ratio r, q and the central element (Q c , R c ), it allows Q-learning to search in an arbitrarily large scope (by manipulating q, r, r and q) with an arbitrarily high precision (by setting r and q close to 1).…”
Section: A the Dynamic Grid And Updated ϵ-Greedy Algorithmmentioning
confidence: 99%
“…As one of the most important reinforcement learning methods, Q-learning [6] has drawn increasing interest in adapting the noise covariance matrices of EKF [7], [8], for its modelfree algorithm, low computation demand, and capability in achieving optimality in Markov decision processes [9]. In our previous work [10], a Q-learning-based EKF (QLEKF) is proposed to autonomously adapt the values of process and measurement noise covariance matrices in the attitude estimation of a rigid body.…”
Section: Introductionmentioning
confidence: 99%