Zhihan Xiong scite author profile

Zhihan Xiong

5Publications

6Citation Statements Received

77Citation Statements Given

How they've been cited

How they cite others

Affiliations

Sichuan University, Stanford University

Publications

Order By: Most citations

Near-Optimal Randomized Exploration for Tabular MDP

Xiong¹,

Shen²,

Cui³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study exploration using randomized value functions in Thompson Sampling (TS)-like algorithms in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case O H √ SAT regret bound for episodic time-inhomogeneous Markov Decision Process where S is the size of state space, A is the size of action space, H is the planning horizon and T is the number of interactions. This bound polynomially improves all existing bounds for TS-like algorithms based on randomized value functions, and for the first time, matches the Ω H √ SAT lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously only achieved by optimistic algorithms. * Equal contribution algorithm [Agrawal et al., 2021] by a √ SH factor. This result also settles an open problem raised in Agrawal et al. [2021].• We further design a new Bernstein-type magnitude of noise for our algorithm, and achieve an O H √ SAT regret bound. To our knowledge, this is the first time that a Bernstein-type bound is used in TS-like algorithms. More importantly, our upper bound matches the Ω H √ SAT minimax lower bound up to logarithmic factors. Therefore, our result conveys an important conceptual message:Randomized exploration can be near-optimal in reinforcement learning. Related WorkIn this section we review existing provably efficient algorithms for tabular MDP. There is a long list of sample complexity guarantees for tabular MDP [

show abstract

Learning in Congestion Games with Bandit Feedback

Cui¹,

Xiong²,

Fazel³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning Nash equilibria is a central problem in multi-agent systems. In this paper, we investigate congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

show abstract

Regulation of the autochthonous microbial community in excess sludge for the bioconversion of carbon dioxide to acetate without exogenic hydrogen

Lin

Tan

Xiong

et al. 2023

Bioresource Technology

View full text Add to dashboard Cite

Selective Sampling for Online Best-arm Identification

Romain¹,

Xiong²,

Fazel³

et al. 2021

Preprint

View full text Add to dashboard Cite

This work considers the problem of selective-sampling for best-arm identification. Given a set of potential options Z ⊂ R d , a learner aims to compute with probability greater than 1 − δ, arg maxz∈Z z θ * where θ * is unknown. At each time step, a potential measurement xt ∈ X ⊂ R d is drawn IID and the learner can either choose to take the measurement, in which case they observe a noisy measurement of x θ * , or to abstain from taking the measurement and wait for a potentially more informative point to arrive in the stream. Hence the learner faces a fundamental trade-off between the number of labeled samples they take and when they have collected enough evidence to declare the best arm and stop sampling. The main results of this work precisely characterize this trade-off between labeled samples and stopping time and provide an algorithm that nearly-optimally achieves the minimal label complexity given a desired stopping time. In addition, we show that the optimal decision rule has a simple geometric form based on deciding whether a point is in an ellipse or not. Finally, our framework is general enough to capture binary classification improving upon previous works.

show abstract

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Tan

Xiong

Dwaracherla

2020

AAAI

View full text Add to dashboard Cite

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhihan Xiong

Near-Optimal Randomized Exploration for Tabular MDP

Learning in Congestion Games with Bandit Feedback

Regulation of the autochthonous microbial community in excess sludge for the bioconversion of carbon dioxide to acetate without exogenic hydrogen

Selective Sampling for Online Best-arm Identification

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Contact Info

Product

Resources

About