2022 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT) 2022
DOI: 10.1109/iaict55358.2022.9887435
|View full text |Cite
|
Sign up to set email alerts
|

A Robust Offline Reinforcement Learning Algorithm Based on Behavior Regularization Methods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…There is a growing number of results under partial coverage following the principle of pessimism in offline RL (Yu et al, 2020;Kidambi et al, 2020). In comparison to works that focus on tabular (Rashidinejad et al, 2021;Shi et al, 2022;Yin and Wang, 2021) or linear models (Jin et al, 2020;Chang et al, 2021;Zhang et al, 2022;Nguyen-Tang et al, 2022;Bai et al, 2022), our emphasis is on general function approximation (Jiang and Huang, 2020;Uehara and Sun, 2022;Xie et al, 2021;Zhan et al, 2022;Rashidinejad et al, 2022;Zanette and Wainwright, 2022). Among them, we specifically focus on model-free methods.…”
Section: Related Workmentioning
confidence: 99%
“…There is a growing number of results under partial coverage following the principle of pessimism in offline RL (Yu et al, 2020;Kidambi et al, 2020). In comparison to works that focus on tabular (Rashidinejad et al, 2021;Shi et al, 2022;Yin and Wang, 2021) or linear models (Jin et al, 2020;Chang et al, 2021;Zhang et al, 2022;Nguyen-Tang et al, 2022;Bai et al, 2022), our emphasis is on general function approximation (Jiang and Huang, 2020;Uehara and Sun, 2022;Xie et al, 2021;Zhan et al, 2022;Rashidinejad et al, 2022;Zanette and Wainwright, 2022). Among them, we specifically focus on model-free methods.…”
Section: Related Workmentioning
confidence: 99%
“…We now proceed to bound (29). It is worth noting that both f t and (s t , a t ) depend on s 0 , a 0 , s 1 , .…”
Section: Lemmamentioning
confidence: 99%
“…Defenses against Adversarial Attacks on RL2.3.2.4 Defenses against Data CorruptionsZhang et al[81] investigate offline-RL's robustness when data corruption occurs.The authors examine the situation where an adversary can modify the ϵ fraction of a batch dataset composed of tuples (s, a, s ′ , r), with the objective of enabling an agent to learn a policy that is near optimal. Through theoretical analysis, they propose robust variants of the Least Square Value Iteration algorithms as well as provide general robustness bounds for RL.…”
mentioning
confidence: 99%