Yiheng Lin scite author profile

Yiheng Lin

5Publications

20Citation Statements Received

180Citation Statements Given

How they've been cited

How they cite others

178

Affiliations

California Institute of Technology, Tsinghua University, Southeast University

Publications

Order By: Most citations

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Lin

Huang

et al. 2020

Preprint

View full text Add to dashboard Cite

We study distributed reinforcement learning (RL) for a network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are local, e.g., between neighbors. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies are non-local and provide a finite-time error bound that shows how the convergence rate depends on the depth of the dependencies in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation that apply beyond the setting of RL in networked systems.Preprint. Under review.

show abstract

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Zhang

et al. 2023

Proc. ACM Meas. Anal. Comput. Syst.

View full text Add to dashboard Cite

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI.

show abstract

Online Optimization with Feedback Delay and Nonlinear Switching Cost

Pan

Shi

Lin

et al. 2022

View full text Add to dashboard Cite

We study a variant of online optimization in which the learner receives 𝑘-round delayed feedback about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio that is 𝑂 (𝐿 2𝑘 ), where 𝐿 is the Lipschitz constant of the nonlinear switching cost. Additionally, we provide lower bounds that illustrate the Lipschitz condition is required and the dependencies on 𝑘 and 𝐿 are tight. Finally, via reductions, we show that this setting is closely related to online control problems with delay, nonlinear dynamics, and adversarial disturbances, where iROBD directly offers constantcompetitive online policies. This extended abstract is an abridged version of [2]. CCS CONCEPTS• Computing methodologies → Online learning settings.

show abstract

Certifying Black-Box Policies With Stability for Nonlinear Control

Yang

et al. 2023

IEEE Open J. Control. Syst.

View full text Add to dashboard Cite

Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of certifying a black-box control policy with stability using model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive λ-confident policy, with a coefficient λ indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive λ-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive λ-confident policy and verify its efficacy in case studies about the Cart-Pole problem and a real-world electric vehicle (EV) charging problem with covariate shift due to COVID-19.INDEX TERMS black-box policy, stability, nonlinear control, covariate shift

show abstract

Online Optimization with Feedback Delay and Nonlinear Switching Cost

Pan

Shi

Lin

et al. 2022

Proc. ACM Meas. Anal. Comput. Syst.

View full text Add to dashboard Cite

We study a variant of online optimization in which the learner receives k-rounddelayed feedback about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio that is $O(L^2k )$, where L is the Lipschitz constant of the switching cost. Additionally, we provide lower bounds that illustrate the Lipschitz condition is required and the dependencies on k and L are tight. Finally, via reductions, we show that this setting is closely related to online control problems with delay, nonlinear dynamics, and adversarial disturbances, where iROBD directly offers constant-competitive online policies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yiheng Lin

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Online Optimization with Feedback Delay and Nonlinear Switching Cost

Certifying Black-Box Policies With Stability for Nonlinear Control

Online Optimization with Feedback Delay and Nonlinear Switching Cost

Contact Info

Product

Resources

About