2021
DOI: 10.1109/tsp.2021.3089822
|View full text |Cite
|
Sign up to set email alerts
|

Safe Linear Thompson Sampling With Side Information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…We verify the theoretical study above with simulations over Example 5.6, and study the relative performance of DOCLB and the optimistic-pessimistic method Safe-LTS [MAAT21]. These implementations are based on the following relaxation of Algorithm 1.…”
Section: Simulationsmentioning
confidence: 85%
“…We verify the theoretical study above with simulations over Example 5.6, and study the relative performance of DOCLB and the optimistic-pessimistic method Safe-LTS [MAAT21]. These implementations are based on the following relaxation of Algorithm 1.…”
Section: Simulationsmentioning
confidence: 85%
“…In contrast, Pacchiano et al (2021); Amani et al (2019); Wu et al (2016) all use optimistic-pessimistic methods, which instead maintain upper bounds on both the rewards and safety risk and play the actions with maximum reward upper bound whilst being safe with respect to the stringent risk upper bounds. Moradipari et al (2021) take a similar pessimistic approach, but replace the reward upper bounds with a Thompson sampling procedure that is similar in spirit to our Alg. 2, although this uses optimistic safety indices.…”
Section: Methodological Approachesmentioning
confidence: 99%
“…These papers also study hard round-wise safety constraints, and again utilise a known safe action, as well as the continuity of the action space to enable sufficient exploration. We note that the particulars of the signalling model adopted by Amani et al (2019) paper preclude extending their results to the multi-armed setting, and while the model of Moradipari et al (2021) does admit such extension, the scheme proposed fundamentally relies on having a continuous action space with a linear safety-risk, and cannot be extended to multi-armed settings without lifting to policy space.…”
Section: Per-round Constraintsmentioning
confidence: 97%
See 1 more Smart Citation
“…Two well-known algorithms for LB are: linear UCB (LinUCB) and linear Thompson Sampling (LinTS). [8] provided a regret bound of order O( √ T log T ) for LinUCB, and [9], [10], [11], and [12] provided a regret bound of order O( √ T (log T ) 3/2 ) for LinTS in a frequentist setting, where the unknown reward parameter θ is fixed.…”
Section: A Related Workmentioning
confidence: 99%