2019
DOI: 10.48550/arxiv.1909.02109
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Stochastic Linear Optimization with Adversarial Corruption

Abstract: We extend the model of stochastic bandits with adversarial corruption (Lykouris et al., 2018) to the stochastic linear optimization problem (Dani et al., 2008). Our algorithm is agnostic to the amount of corruption chosen by the adaptive adversary. The regret of the algorithm only increases linearly in the amount of corruption. Our algorithm involves using Löwner-John's ellipsoid for exploration and dividing time horizon into epochs with exponentially increasing size to limit the influence of corruption.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
19
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(19 citation statements)
references
References 7 publications
0
19
0
Order By: Relevance
“…Gupta et al (2019) proposed an improved algorithm that can achieve a regret bound with only additive dependence on C. On the flip side, many research efforts have also been devoted into designing adversarial attacks that cause standard bandit algorithms to fail (Jun et al, 2018;Liu and Shroff, 2019;Lykouris et al, 2018;Garcelon et al, 2020). Stochastic Linear Bandits with Corruptions: Li et al (2019a) studied stochastic linear bandits with adversarial corruptions and achieved O(d 5/2 C/∆ + d 6 /∆ 2 ) regret bound where d is the dimension of the context vectors, ∆ is the gap between the rewards of the best and the second best actions in the decision set D. Bogunovic et al (2021) Lee et al (2021) used a slightly different definition of regret and adopted a strong assumption on corruptions that at each round t, the corruptions on rewards are linear in the actions. Linear Contextual Bandits with Corruptions: Bogunovic et al (2021) studied linear contextual bandits with adversarial corruptions and considered the setting under the assumption that or equal to Quantity(A) in their proof.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Gupta et al (2019) proposed an improved algorithm that can achieve a regret bound with only additive dependence on C. On the flip side, many research efforts have also been devoted into designing adversarial attacks that cause standard bandit algorithms to fail (Jun et al, 2018;Liu and Shroff, 2019;Lykouris et al, 2018;Garcelon et al, 2020). Stochastic Linear Bandits with Corruptions: Li et al (2019a) studied stochastic linear bandits with adversarial corruptions and achieved O(d 5/2 C/∆ + d 6 /∆ 2 ) regret bound where d is the dimension of the context vectors, ∆ is the gap between the rewards of the best and the second best actions in the decision set D. Bogunovic et al (2021) Lee et al (2021) used a slightly different definition of regret and adopted a strong assumption on corruptions that at each round t, the corruptions on rewards are linear in the actions. Linear Contextual Bandits with Corruptions: Bogunovic et al (2021) studied linear contextual bandits with adversarial corruptions and considered the setting under the assumption that or equal to Quantity(A) in their proof.…”
Section: Related Workmentioning
confidence: 99%
“…This assumption on t is a variant of that in Zhou et al (2020): Here we require the noise to be generated for all a ∈ D t in advance before the adversary decides the corrupted reward function. Our assumption on noises is more general than those in Li et al (2019a); Bogunovic et al (2021); Kapoor et al (2019) where the noises are assumed to be 1-sub-Gaussian or Gaussian. The motivation behind this assumption is that the environment may change over time in practical applications.…”
Section: Preliminariesmentioning
confidence: 99%
“…Note that the works Zimmert and Seldin [2021], Masoudian and Seldin [2021] use a different regret metric, and the additive term of E[C] is necessary if we convert their results to E[R T ] (see Appendix G.2). Besides, the corrupted setting is also considered for the linear bandits Li et al [2019], Bogunovic et al [2020Bogunovic et al [ , 2021, Lee et al [2021]. The works Li et al [2019], Bogunovic et al [2021] incurs a O(C 2 ) or O(C/∆ min ) dependence of corruption for regret when the adversary exactly knows the currently chosen arm, while Lee et al [2021] provides a high probability regret which has additive dependence of O(C), when the adversary is unaware of the current choice.…”
Section: Related Workmentioning
confidence: 99%
“…Besides, the corrupted setting is also considered for the linear bandits Li et al [2019], Bogunovic et al [2020Bogunovic et al [ , 2021, Lee et al [2021]. The works Li et al [2019], Bogunovic et al [2021] incurs a O(C 2 ) or O(C/∆ min ) dependence of corruption for regret when the adversary exactly knows the currently chosen arm, while Lee et al [2021] provides a high probability regret which has additive dependence of O(C), when the adversary is unaware of the current choice. The result in Lee et al [2021] becomes O(C) + O(K 1.5 /∆ min ) in the MAB setup.…”
Section: Related Workmentioning
confidence: 99%
“…Studies on online optimization algorithms robust to adversarial corruptions has been extended to a variety of models, including those for the multi-armed bandit [Lykouris et al, 2018, Gupta et al, 2019, Zimmert and Seldin, 2021, Hajiesmaili et al, 2020, Gaussian process bandits [Bogunovic et al, 2020], Markov decision processes [Lykouris et al, 2019], the problem of prediction with expert advice [Amir et al, 2020], online linear optimization [Li et al, 2019], and linear bandits [Bogunovic et al, 2021, Lee et al, 2021. There can be found the literature on effective attacks to bandit algorithms [Jun et al, 2018, Liu andShroff, 2019] as well.…”
Section: Related Workmentioning
confidence: 99%