Stochastic Linear Optimization with Adversarial Corruption

Li, Yingkai; Y., Lou, Edmund; Shan, Liren

doi:10.48550/arxiv.1909.02109

Cited by 11 publications

(19 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gupta et al (2019) proposed an improved algorithm that can achieve a regret bound with only additive dependence on C. On the flip side, many research efforts have also been devoted into designing adversarial attacks that cause standard bandit algorithms to fail (Jun et al, 2018;Liu and Shroff, 2019;Lykouris et al, 2018;Garcelon et al, 2020). Stochastic Linear Bandits with Corruptions: Li et al (2019a) studied stochastic linear bandits with adversarial corruptions and achieved O(d 5/2 C/∆ + d 6 /∆ 2 ) regret bound where d is the dimension of the context vectors, ∆ is the gap between the rewards of the best and the second best actions in the decision set D. Bogunovic et al (2021) Lee et al (2021) used a slightly different definition of regret and adopted a strong assumption on corruptions that at each round t, the corruptions on rewards are linear in the actions. Linear Contextual Bandits with Corruptions: Bogunovic et al (2021) studied linear contextual bandits with adversarial corruptions and considered the setting under the assumption that or equal to Quantity(A) in their proof.…”

Section: Related Workmentioning

confidence: 99%

“…This assumption on t is a variant of that in Zhou et al (2020): Here we require the noise to be generated for all a ∈ D t in advance before the adversary decides the corrupted reward function. Our assumption on noises is more general than those in Li et al (2019a); Bogunovic et al (2021); Kapoor et al (2019) where the noises are assumed to be 1-sub-Gaussian or Gaussian. The motivation behind this assumption is that the environment may change over time in practical applications.…”

Section: Preliminariesmentioning

confidence: 99%

See 1 more Smart Citation

Linear Contextual Bandits with Adversarial Corruptions

Zhao¹,

Zhou²,

Gu³

2021

Preprint

View full text Add to dashboard Cite

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level C measured by the sum of the largest alteration on rewards in each round. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination C. The key algorithmic design includes (1) a multi-level partition scheme of the observed data, (2) a cascade of confidence sets that are adaptive to the level of the corruption, and (3) a variance-aware confidence set construction that can take advantage of low-variance reward. We further prove that the regret of the proposed, where d is the dimension of context vectors, T is the number of rounds, R is the range of noise and σ 2 t , t = 1 . . . , T are the variances of instantaneous reward. We also prove a gap-dependent regret bound for the proposed algorithm, which is instance-dependent and thus leads to better performance on good practical instances. To the best of our knowledge, this is the first variance-aware corruption-robust algorithm for contextual bandits. Experiments on synthetic data corroborate our theory.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

Linear Contextual Bandits with Adversarial Corruptions

Zhao¹,

Zhou²,

Gu³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Note that the works Zimmert and Seldin [2021], Masoudian and Seldin [2021] use a different regret metric, and the additive term of E[C] is necessary if we convert their results to E[R T ] (see Appendix G.2). Besides, the corrupted setting is also considered for the linear bandits Li et al [2019], Bogunovic et al [2020Bogunovic et al [ , 2021, Lee et al [2021]. The works Li et al [2019], Bogunovic et al [2021] incurs a O(C 2 ) or O(C/∆ min ) dependence of corruption for regret when the adversary exactly knows the currently chosen arm, while Lee et al [2021] provides a high probability regret which has additive dependence of O(C), when the adversary is unaware of the current choice.…”

Section: Related Workmentioning

confidence: 99%

“…Besides, the corrupted setting is also considered for the linear bandits Li et al [2019], Bogunovic et al [2020Bogunovic et al [ , 2021, Lee et al [2021]. The works Li et al [2019], Bogunovic et al [2021] incurs a O(C 2 ) or O(C/∆ min ) dependence of corruption for regret when the adversary exactly knows the currently chosen arm, while Lee et al [2021] provides a high probability regret which has additive dependence of O(C), when the adversary is unaware of the current choice. The result in Lee et al [2021] becomes O(C) + O(K 1.5 /∆ min ) in the MAB setup.…”

Section: Related Workmentioning

confidence: 99%

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

Liu,

Li,

2021

Preprint

View full text Add to dashboard Cite

We study the problem of stochastic bandits with adversarial corruptions in the cooperative multi-agent setting, where V agents interact with a common K-armed bandit problem, and each pair of agents can communicate with each other to expedite the learning process. In the problem, the rewards are independently sampled from distributions across all agents and rounds, but they may be corrupted by an adversary. Our goal is to minimize both the overall regret and communication cost across all agents. We first show that an additive term of corruption is unavoidable for any algorithm in this problem. Then, we propose a new algorithm that is agnostic to the level of corruption. Our algorithm not only achieves near-optimal regret in the stochastic setting, but also obtains a regret with an additive term of corruption in the corrupted setting, while maintaining efficient communication. The algorithm is also applicable for the single-agent corruption problem, and achieves a high probability regret that removes the multiplicative dependence of K on corruption level. Our result of the single-agent case resolves an open question from Gupta et al. [2019].Preprint. Under review.

show abstract

“…Studies on online optimization algorithms robust to adversarial corruptions has been extended to a variety of models, including those for the multi-armed bandit [Lykouris et al, 2018, Gupta et al, 2019, Zimmert and Seldin, 2021, Hajiesmaili et al, 2020, Gaussian process bandits [Bogunovic et al, 2020], Markov decision processes [Lykouris et al, 2019], the problem of prediction with expert advice [Amir et al, 2020], online linear optimization [Li et al, 2019], and linear bandits [Bogunovic et al, 2021, Lee et al, 2021. There can be found the literature on effective attacks to bandit algorithms [Jun et al, 2018, Liu andShroff, 2019] as well.…”

Section: Related Workmentioning

confidence: 99%

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Ito

2021

Preprint

View full text Add to dashboard Cite

This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve)-regret, where N, ∆, and C represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.Preprint. Under review.

show abstract

Stochastic Linear Optimization with Adversarial Corruption

Cited by 11 publications

References 7 publications

Linear Contextual Bandits with Adversarial Corruptions

Linear Contextual Bandits with Adversarial Corruptions

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Contact Info

Product

Resources

About