On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Shahrampour, Shahin; Noshad, Mohammad; Tarokh, Vahid

doi:10.1109/tsp.2017.2706192

Cited by 24 publications

(14 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithm is summarized in Algorithm 2. Compared with the original UCB algorithm in (3), the main difference is the additional term γ(μ(t − 1), N(t − 1)) in (15). We now highlight the main idea why our bandit algorithm works and the role of this additional term.…”

Section: A Robust Bandit Algorithmmentioning

confidence: 97%

See 1 more Smart Citation

Action-Manipulation Attacks on Stochastic Bandits

Liu

Lai

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We show that without knowledge of mean rewards of arms, our proposed attack can manipulate Upper Confidence Bound (UCB) algorithm, a widely used bandit algorithm, into pulling a target arm very frequently by spending only logarithmic cost. To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks when an upper bound for the total attack cost is given. We prove that our algorithm has a pseudo-regret upper bounded by O(max{log T, A}), where T is the total number of rounds and A is the upper bound of the total attack cost.

show abstract

Section: A Robust Bandit Algorithmmentioning

confidence: 97%

“…else 5: The user chooses arm I t to pull according (15). 6: end if 7: if The attacker decides to attack then 8: The attacker attacks and changes I t to I 0 t .…”

mentioning

confidence: 99%

Action-Manipulation Attacks on Stochastic Bandits

Liu

Lai

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Xu [9] used MAB models to balance exploiting user data and protecting user privacy in dynamic pricing. Shahrampour [10] proposed a new algorithm for choosing the best arm of MAB, which outperforms the state-of-art. Lacerda [11] proposed an algorithm named as Multi-Objective Ranked Bandits for recommender systems.…”

Section: Related Workmentioning

confidence: 99%

Exploratory Recommender Systems Based on Reinforcement Learning for Finding Research Topic

Yu¹,

Wang²

2018

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Traditional recommender systems try to select few items from some candidate items to users. Unfortunately, a user often hope recommender system help him to make a decision or finish a task based on his uncertain preference. For example, a researcher could hope recommender system to help him to find an advanced research topic by recommending literatures paper and refining his research interest and. In this paper, we develop an exploratory paper recommender system based on reinforcement learning, which can navigate a researcher to identify research topic by recommending papers continuously. In order to refine and focus user's research preference, as a reinforcement learning method, Multi-Armed Bandit (MAB) is employed for navigating recommendation paper. And two improved MAB methods are proposed, including ε-Greedy Stochastic Perturbation (ε-Greedy-SP) and Continuous Upper Confidence Bound (Con-UCB). Also, a weighted-LDA method is proposed for constructing the topic tree. A prototype system is developed and used to make experiments. Empirical research is made to analyze the change process of users' preference. The results show that the system is very effective for focusing and finding research topic.

show abstract

“…These studies seldom consider the uncertainty of users' behaviours, so this paper introduces an online learning method called multi-armed bandits (MAB) to solve the problem. MAB has shown effectiveness and merit in air conditioning demand aggregation [16] and many other sequential decisionmaking problems containing uncertain/unknown behavioural factors [17][18][19][20][21][22][23][24][25][26][27]. In reference [28], an adversarial MAB framework is applied to learn the signal response of thermal control loads for demand response in real-time.…”

Section: Introductionmentioning

confidence: 99%

A dynamic distributed energy storage control strategy for providing primary frequency regulation using multi‐armed bandits method

Sun

Zhao

Zhang

et al. 2021

IET Generation Trans & Dist

View full text Add to dashboard Cite

Maintaining frequency stability is a crucial but challenging task for the stable operation of a power system. The distributed energy storage (DES) can charge or discharge for both upward and downward frequency regulation, exploring and effectively using the regulation capabilities will provide a strong backup for frequency regulation. Here, a dynamic DES control strategy for providing primary frequency regulation is proposed. The different behaviours of storage owners are considered when they respond to the regulation requests from the aggregator. This kind of uncertainty would lead to a mismatch between the final aggregation result and the expected target. Hence, the multi-armed bandit approach is applied to learn users' response behaviour and select the optimal set of users to mitigate the mismatch. Case studies on the IEEE RTS 24-bus system demonstrate that the proposed method's mismatch between the actual aggregation result and the regulation target is less than half as much as the conventional method, and it can restore the frequency 20 events earlier than the traditional method.

show abstract

On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits

Cited by 24 publications

References 21 publications

Action-Manipulation Attacks on Stochastic Bandits

Action-Manipulation Attacks on Stochastic Bandits

Exploratory Recommender Systems Based on Reinforcement Learning for Finding Research Topic

A dynamic distributed energy storage control strategy for providing primary frequency regulation using multi‐armed bandits method

Contact Info

Product

Resources

About