Jia Yuan Yu scite author profile

We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well—in hindsight—as every stationary policy. This generalizes the classical no-regret result for repeated games. Specifically, we present an efficient online algorithm—in the spirit of reinforcement learning—that ensures that the agent's average performance loss vanishes over time, provided that the environment is oblivious to the agent's actions. Moreover, it is possible to modify the basic algorithm to cope with instances where reward observations are limited to the agent's trajectory. We present further modifications that reduce the computational cost by using function approximation and that track the optimal policy through infrequent changes.

show abstract

Lipschitz Bandits without the Lipschitz Constant

Bubeck

Stoltz

2011

View full text Add to dashboard Cite

Abstract. We consider the setting of stochastic bandit problems with a continuum of arms indexed by [0, 1] d . We first point out that the strategies considered so far in the literature only provided theoretical guarantees of the form: given some tuning parameters, the regret is small with respect to a class of environments that depends on these parameters. This is however not the right perspective, as it is the strategy that should adapt to the specific bandit environment at hand, and not the other way round. Put differently, an adaptation issue is raised. We solve it for the special case of environments whose mean-payoff functions are globally Lipschitz. More precisely, we show that the minimax optimal orders of magnitude L d/(d+2) T (d+1)/(d+2) of the regret bound over T time instances against an environment whose mean-payoff function f is Lipschitz with constant L can be achieved without knowing L or T in advance. This is in contrast to all previously known strategies, which require to some extent the knowledge of L to achieve this performance guarantee.

show abstract

Piecewise-stationary bandit problems with side observations

Mannor

2009

View full text Add to dashboard Cite

We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k log(T )), where k is the number of changes up to time T . This is in contrast to the case where side observations are not available, and where the regret is at least Ω( √ T ).

show abstract

Automated plankton classification from holographic imagery with deep convolutional neural networks

Guo

Nyman

Nayak

et al. 2020

Limnology & Ocean Methods

View full text Add to dashboard Cite

In situ digital inline holography is a technique which can be used to acquire high‐resolution imagery of plankton and examine their spatial and temporal distributions within the water column in a nonintrusive manner. However, for effective expert identification of an organism from digital holographic imagery, it is necessary to apply a computationally expensive numerical reconstruction algorithm. This lengthy process inhibits real‐time monitoring of plankton distributions. Deep learning methods, such as convolutional neural networks, applied to interference patterns of different organisms from minimally processed holograms can eliminate the need for reconstruction and accomplish real‐time computation. In this article, we integrate deep learning methods with digital inline holography to create a rapid and accurate plankton classification network for 10 classes of organisms that are commonly seen in our data sets. We describe the procedure from preprocessing to classification. Our network achieves 93.8% accuracy when applied to a manually classified testing data set. Upon further application of a probability filter to eliminate false classification, the average precision and recall are 96.8% and 95.0%, respectively. Furthermore, the network was applied to 7500 in situ holograms collected at East Sound in Washington during a vertical profile to characterize depth distribution of the local diatoms. The results are in agreement with simultaneously recorded independent chlorophyll concentration depth profiles. This lightweight network exemplifies its capability for real‐time, high‐accuracy plankton classification and it has the potential to be deployed on imaging instruments for long‐term in situ plankton monitoring.

show abstract

On the Design of Campus Parking Systems With QoS Guarantees

Griggs

Wirth

et al. 2016

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Parking spaces are resources that can be pooled together and shared, especially when there are complementary day-time and night-time users. We answer two design questions. First, given a quality of service requirement, how many spaces should be set aside as contingency during day-time for night-time users? Next, how can we replace the first-come-first-served access method by one that aims at optimal efficiency while keeping user preferences private?

show abstract

Signalling and obfuscation for congestion control

Mareček

Shorten

2015

International Journal of Control

View full text Add to dashboard Cite

We aim to reduce the social cost of congestion in many smart city applications. In our model of congestion, agents interact over limited resources after receiving signals from a central agent that observes the state of congestion in real time. Under natural models of agent populations, we develop new signalling schemes and show that by introducing a non-trivial amount of uncertainty in the signals, we reduce the social cost of congestion, i.e., improve social welfare. The signalling schemes are efficient in terms of both communication and computation, and are consistent with past observations of the congestion. Moreover, the resulting population dynamics converge under reasonable assumptions

show abstract

r-extreme signalling for congestion control

Mareček

Shorten

2016

International Journal of Control

View full text Add to dashboard Cite

In many "smart city" applications, congestion arises in part due to the nature of signals received by individuals from a central authority. In the model of Mareček et al. [Int. J. Control 88(10), 2015], each agent uses one out of multiple resources at each time instant. The per-use cost of a resource depends on the number of concurrent users. A central authority has up-to-date knowledge of the congestion across all resources and uses randomisation to provide a scalar or an interval for each resource at each time. In this paper, the interval to broadcast per resource is obtained by taking the minima and maxima of costs observed within a time window of length r, rather than by randomisation. We show that the resulting distribution of agents across resources also converges in distribution, under plausible assumptions about the evolution of the population over time.

show abstract

DOA Estimation Methods and Algorithms

Chung

Viberg

2014

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jia Yuan Yu

Markov Decision Processes with Arbitrary Reward Processes

Lipschitz Bandits without the Lipschitz Constant

Piecewise-stationary bandit problems with side observations

Automated plankton classification from holographic imagery with deep convolutional neural networks

On the Design of Campus Parking Systems With QoS Guarantees

Signalling and obfuscation for congestion control

r-extreme signalling for congestion control

DOA Estimation Methods and Algorithms

Contact Info

Product

Resources

About