Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy

Fan-Ming, Luo,; Jiang, Shengyi; Yu, Yang; Zhang, ZongZhang; Zhang, Yifeng

doi:10.1609/aaai.v36i7.20730

Cited by 13 publications

(9 citation statements)

References 11 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another one of our goals is to find an algorithm that can perform well in all the environments with little modification, which can be useful if a plant decides to change a reactor tank, or we want to adopt a trained algorithm for a new plant. We would like to develop existing meta-learning algorithms like [49,50].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Zhang¹,

Wang²,

Decardi‐Nelson³

et al. 2022

Preprint

View full text Add to dashboard Cite

Traditional biological and pharmaceutical manufacturing plants are controlled by human workers or pre-defined thresholds. Modernized factories have advanced process control algorithms such as model predictive control (MPC). However, there is little exploration of applying deep reinforcement learning to control manufacturing plants. One of the reasons is the lack of high fidelity simulations and standard APIs for benchmarking. To bridge this gap, we develop an easy-to-use library that includes five high-fidelity simulation environments: BeerFMTEnv, ReactorEnv, AtropineEnv, PenSimEnv and mAbEnv, which cover a wide range of manufacturing processes. We build these environments on published dynamics models. Furthermore, we benchmark online and offline, model-based and model-free reinforcement learning algorithms for comparisons of follow-up research. † † Official documentation:https://smpl-env.readthedocs.io/en/latest/index.html Official implementation: https://github.com/smpl-env/smpl Code of experiments: https://github.com/smpl-env/smpl-experiments Preprint. Under review.

show abstract

Section: Discussionmentioning

confidence: 99%

“…The adsorption kinetic model shown in Equations (50), (51) and Equation ( 52) can also be used to describe the CEX and AEX chromatography process. The same rule applies to the boundary conditions.…”

Section: A3143 Cex and Aex Chromatographymentioning

confidence: 99%

SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Zhang¹,

Wang²,

Decardi‐Nelson³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Moreover, we have noticed that the generalizable reward function can lead to a better policy in the target task, which introduces an alternative way for transfer reinforcement learning other than previous policy-based transfer methods (e.g. [38,39]). We will also explore reward-based transfer reinforcement learning methods.…”

Section: Discussionmentioning

confidence: 99%

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

Fan-Ming¹,

Cao²,

Yu³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Inverse reinforcement learning (IRL) recovers the underlying reward function from expert demonstrations. A generalizable reward function is even desired as it captures the fundamental motivation of the expert. However, classical IRL methods can only recover reward functions coupled with the training dynamics, thus are hard to generalize to a changed environment. Previous dynamics-agnostic reward learning methods have strict assumptions, such as that the reward function has to be state-only. This work proposes a general approach to learn transferable reward functions, Dynamics-Agnostic Discriminator-Ensemble Reward Learning (DARL). Following the adversarial imitation learning (AIL) framework, DARL learns a dynamics-agnostic discriminator on a latent space mapped from the original state-action space. The latent space is learned to contain the least information of the dynamics. Moreover, to reduce the reliance of the discriminator on policies, the reward function is represented as an ensemble of the discriminators during training. We assess DARL in four MuJoCo tasks with dynamics transfer. Empirical results compared with the state-of-the-art AIL methods show that DARL can learn a reward that is more consistent with the true reward, thus obtaining higher environment returns.

show abstract

“…Meta reinforcement learning [Duan et al, 2016, Houthooft et al, 2018 studies the methodologies that enable the agent to generalize across different tasks with few-shot samples in the target tasks. In this process, we have a set of tasks for policy training, but the deployed tasks are unknown, can be OOD compared with the distribution of the training tasks [Lee et al, 2020a], and even can be varied when deployed [Luo et al, 2022]. Tasks have different definitions in different scenarios, e.g., differences in reward functions [Finn et al, 2017a, Rothfuss et al, 2019, or parameters of dynamics [Peng et al, 2018, Zhang et al, 2018a.…”

Section: Meta Rlmentioning

confidence: 99%

A Survey on Model-based Reinforcement Learning

Fan-Ming¹,

Xu²,

Lai³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. While RL achieves outstanding success in playing complex video games that allow huge trial-and-error, making errors is always undesired in the real world. To improve the sample efficiency and thus reduce the errors, model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs. In this survey, we take a review of MBRL with a focus on the recent progress in deep RL. For non-tabular environments, there is always a generalization error between the learned environment model and the real environment. As such, it is of great importance to analyze the discrepancy between policy training in the environment model and that in the real environment, which in turn guides the algorithm design for better model learning, model usage, and policy training. Besides, we also discuss the recent advances of model-based techniques in other forms of RL, including offline RL, goal-conditioned RL, multi-agent RL, and meta-RL. Moreover, we discuss the applicability and advantages of MBRL in real-world tasks. Finally, we end this survey by discussing the promising prospects for the future development of MBRL. We think that MBRL has great potential and advantages in real-world applications that were overlooked, and we hope this survey could attract more research on MBRL.

show abstract

Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy

Cited by 13 publications

References 11 publications

SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

A Survey on Model-based Reinforcement Learning

Contact Info

Product

Resources

About