Universal linear least-squares prediction

Singer, Andrew C.; Feder, Meir

doi:10.1109/isit.2000.866371

Cited by 13 publications

(8 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In machine learning literature [1], [2], the area of online learning [3] is heavily investigated in various fields from game theory [4], [5], control theory [6]- [8], decision theory [9], [10] to computational learning theory [11], [12]. Because of the heavily utilized universal prediction perspective [13], it has been considerably applied in data and signal processing [14]- [19], especially in sequential prediction and estimation problems [20]- [23] such as the problem of density estimation and anomaly detection [24]- [28]. Some of its most prominent applications are in multi-agent systems [29]- [31] and specifically in reinforcement learning problems [32]- [42].…”

Section: A Preliminariesmentioning

confidence: 99%

See 1 more Smart Citation

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Gokcesu¹,

Gokcesu²

2021

Preprint

View full text Add to dashboard Cite

We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses. We study the expected performance of our algorithm against a generic competition class, which makes it applicable for a wide variety of problem scenarios. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences, which is the difference between our losses and a competing loss sequence. The competition class can be designed to include fixed arm selections, switching bandits, contextual bandits, or any other competition of interest. The sequences in the competition class are generally determined by the specific application at hand and should be designed accordingly. Our algorithm neither uses nor needs any preliminary information about the loss sequences and is completely online. Its performance bounds are the second order bounds in terms of sum of the squared losses, where any affine transform of the losses has no effect on the normalized regret.

show abstract

Section: A Preliminariesmentioning

confidence: 99%

“…The multi-armed bandit is widely considered to be the limited feedback version of the well studied prediction with expert advice [15]- [17], [21], [23]. Due to the nature of the problem, only the loss of the selected arm is observed (while others remain hidden).…”

Section: A Preliminariesmentioning

confidence: 99%

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Gokcesu¹,

Gokcesu²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In these applications of reinforcement learning, we encounter the fundamental dilemma of exploration-exploitation tradeoff, which is most thoroughly studied in the multiarmed bandit problem [8]. The multiarmed bandit problem is generally considered to be the limited feedback version of the wellstudied prediction with expert advice [9]- [13]. It has attracted a significant attention, since the bandit setting can be successfully applied to a wide range of learning applications from recommender systems [14] and dimensionality reduction [15] to probability matching [16].…”

Section: A Preliminariesmentioning

confidence: 99%

“…By summing the probabilities of strategies that suggests the same bandit arm, we construct the probabilities of each bandit arm at time t. The strategies to be used are not specifically selected a priori. Instead, at each time t, all of the strategies s t that compromise the class M t are treated as experts in our online learning problem [11], [13], [21]. These strategies (or experts) are combined according to their weights w s t , which indicates our trust in different strategies, to achieve the performance of the optimal expert.…”

Section: A Brute Force Approachmentioning

confidence: 99%

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem

Gokcesu

Kozat

2018

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

We investigate the adversarial multiarmed bandit problem and introduce an online algorithm that asymptotically achieves the performance of the best switching bandit arm selection strategy. Our algorithms are truly online such that we do not use the game length or the number of switches of the best arm selection strategy in their constructions. Our results are guaranteed to hold in an individual sequence manner, since we have no statistical assumptions on the bandit arm losses. Our regret bounds, i.e., our performance bounds with respect to the best bandit arm selection strategy, are minimax optimal up to logarithmic terms. We achieve the minimax optimal regret with computational complexity only log-linear in the game length. Thus, our algorithms can be efficiently used in applications involving big data. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art switching bandit algorithms. We also introduce a general efficiently implementable bandit arm selection framework, which can be adapted to various applications.

show abstract

“…if n > k + 1, and the all-zero vector otherwise. It can be shown using a recursive technique (see e.g., Tsypkin [29], Györfi [15], Singer and Feder [27], and Györfi and Lugosi [18]) that the c n,j can be calculated with small computational complexity. The experts are mixed via an exponential weighting, which is defined the same way as earlier.…”

Section: Generalized Linear Prediction Strategymentioning

confidence: 99%

Nonparametric sequential prediction of time series

Biau¹,

Bleakley²,

Györfi³

et al. 2010

Journal of Nonparametric Statistics

View full text Add to dashboard Cite

Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of "experts" and show the universal consistency of these strategies under a minimum of conditions. We perform an indepth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error.

show abstract

Universal linear least-squares prediction

Cited by 13 publications

References 34 publications

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem

Nonparametric sequential prediction of time series

Contact Info

Product

Resources

About