2004
DOI: 10.1214/105051604000000350
|View full text |Cite
|
Sign up to set email alerts
|

When can the two-armed bandit algorithm be trusted?

Abstract: We investigate the asymptotic behavior of one version of the so-called two-armed bandit algorithm. It is an example of stochastic approximation procedure whose associated ODE has both a repulsive and an attractive equilibrium, at which the procedure is noiseless. We show that if the gain parameter is constant or goes to 0 not too fast, the algorithm does fall in the noiseless repulsive equilibrium with positive probability, whereas it always converges to its natural attractive target when the gain parameter go… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
67
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(70 citation statements)
references
References 11 publications
3
67
0
Order By: Relevance
“…A challenging question would be to prove (or disprove) that this event has zero probability when A ď 1. This is reminiscent of the situation thoroughly analyzed for two-armed bandit problems in [29,30].…”
Section: Examples and Applicationsmentioning
confidence: 92%
“…A challenging question would be to prove (or disprove) that this event has zero probability when A ď 1. This is reminiscent of the situation thoroughly analyzed for two-armed bandit problems in [29,30].…”
Section: Examples and Applicationsmentioning
confidence: 92%
“…The optimal bids plotted in Figure 2 are consistent with intuition: the higher the remaining budget and the closer to the terminal time, the higher the optimal bid. alternative is to use multi-armed bandit algorithms such as those proposed in [18,19,20]. 22 A Pareto distribution or an extreme-value distribution are also relevant choices.…”
Section: Numerical Approximations Of V and B *mentioning
confidence: 99%
“…This choice is justi¯ed by two main reasons, (i) the continuous problem is more natural to handle by using stochastic approximation techniques, and (ii) it joins the current literature on the topic, both from a stochastic-control perspective (Guéant et al, 2012; and from an online learning perspective (Laruelle et al, 2011;. Nevertheless, a natural alternative to our modeling, is to directly focus on the discrete grid of prices and de¯ne the optimal strategy by using a multi-armed bandit algorithm (Lamberton et al, 2004;Lamberton and Pag es, 2008). To our knowledge, this research direction has not been explored for the case of algorithmic-trading strategies.…”
Section: A Caveat Concerning Modeling Choicesmentioning
confidence: 99%