Some aspects of the sequential design of experiments

Robbins, Herbert

doi:10.1090/s0002-9904-1952-09620-8

Cited by 1,489 publications

(415 citation statements)

References 5 publications

Supporting

Mentioning

380

Contrasting

Unclassified

Order By: Relevance

“…In this paper we analyze the Win-Stay, Lose-Switch (WSLS) model, also known as WinStay, Lose-Shift, which is used in psychology, game theory, statistics and machine learning [14], [15].…”

Section: Decision-making Modelmentioning

confidence: 99%

Integrating human and robot decision-making dynamics with feedback: Models and convergence analysis

Cao¹,

Stewart

Leonard

2008

2008 47th IEEE Conference on Decision and Control

View full text Add to dashboard Cite

“…In this paper we analyze the Win-Stay, Lose-Switch (WSLS) model, also known as WinStay, Lose-Shift, which is used in psychology, game theory, statistics and machine learning [14], [15].…”

Section: Decision-making Modelmentioning

confidence: 99%

Integrating human and robot decision-making dynamics with feedback: Models and convergence analysis

Cao¹,

Stewart

Leonard

2008

2008 47th IEEE Conference on Decision and Control

View full text Add to dashboard Cite

“…The well-studied multiarmed bandit problem was originally proposed by Robbins [7] in 1985. A gambler, firstly, chooses K slot machines to play.…”

Section: Multi-armed Bandit Problemmentioning

confidence: 99%

“…Additionally, among all the reinforcement learning techniques, a set of so-called multi-armed bandit algorithms is particularly suitable for the optimization of the network. That is, the number of transmissions in each sensor node can be furthermore modeled as a multiarmed bandit problem, originally described by Robins [7]. A multi-armed bandit, also called K-armed bandit, is similar to a traditional slot machine but generally with more than one lever.…”

Section: Introductionmentioning

confidence: 99%

Bandit Learning with Concurrent Transmissions for Energy-Efficient Flooding in Sensor Networks

Zhang¹,

Gao²,

Theel³

2018

EAI Endorsed Transactions on Industrial Networks and Intelligen

View full text Add to dashboard Cite

Concurrent transmissions, a novel communication paradigm, has been shown to effectively accomplish a reliable and energy-efficient flooding in low-power wireless networks. With multiple nodes exploiting a receive-and-forward scheme in the network, this technique inevitably introduces communication redundancy and consequently raises the energy consumption of the nodes. In this article, we propose Less is More (LiM), an energy-efficient flooding protocol for wireless sensor networks. LiM builds on concurrent transmissions, exploiting constructive interference and the capture effect to achieve high reliability and low latency. Moreover, LiM is equipped with a machine learning capability to progressively reduce redundancy while maintaining high reliability. As a result, LiM is able to significantly reduce the radio-on time and therefore the energy consumption. We compare LiM with our baseline protocol Glossy by extensive experiments in the 30-node testbed FlockLab. Experimental results show that LiM highly reduces the broadcast redundancy in flooding. It outperforms the baseline protocol in terms of radio-on time, while attaining a high reliability of over 99.50% and an average end-to-end latency around 2 milliseconds in all experimental scenarios.

show abstract

“…The problem has its roots in the seminal works of Robbins (1952) and Bradt et al (1956), who focused on the much-studied case where engaging a project corresponds to sampling from a Bernoulli population with unknown success probability, the goal being to maximize the expected number of successes over T plays. An MDP formulation is obtained by a Bayesian approach, where a project/population state is its posterior distribution.…”

Section: Finite-horizon Multiarmed Banditsmentioning

confidence: 99%

Computing a Classic Index for Finite-Horizon Bandits

Niòo-Mora

2011

INFORMS Journal on Computing

View full text Add to dashboard Cite

T his paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon. Besides characterizing optimal policies for the finite-horizon one-armed bandit problem, such an index provides a suboptimal heuristic index rule for the intractable finite-horizon multiarmed bandit problem, which represents the natural extension of the Gittins index rule (optimal in the infinite-horizon case). Although such a finite-horizon index was introduced in classic work in the 1950s, investigation of its efficient exact computation has received scant attention. This paper introduces a recursive adaptive-greedy algorithm using only arithmetic operations that computes the index in (pseudo-)polynomial time in the problem parameters (number of project states and time horizon length). In the special case of a project with limited transitions per state, the complexity is either reduced or depends only on the length of the time horizon. The proposed algorithm is benchmarked in a computational study against the conventional calibration method.

show abstract

Some aspects of the sequential design of experiments

Cited by 1,489 publications

References 5 publications

Integrating human and robot decision-making dynamics with feedback: Models and convergence analysis

Integrating human and robot decision-making dynamics with feedback: Models and convergence analysis

Bandit Learning with Concurrent Transmissions for Energy-Efficient Flooding in Sensor Networks

Computing a Classic Index for Finite-Horizon Bandits

Contact Info

Product

Resources

About