Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control

Joshi, Tanuja; Makker, Shikhar; Kodamana, Hariprasad; Kandath, Harikumar

doi:10.1016/j.compchemeng.2021.107527

Cited by 29 publications

(14 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This case study involves a nonlinear process in which a non‐isothermal reaction takes place in a batch reactor with reactant A converting into products (see Figure 8). [ 70,108,109 ]…”

Section: Evaluation Of Ac Methodsmentioning

confidence: 99%

A survey and comparative evaluation of actor‐critic methods in process control

Dutta

Upreti

2022

Can J Chem Eng

View full text Add to dashboard Cite

Actor-critic (AC) methods have emerged as an important class of reinforcement learning (RL) paradigm that enables model-free control by acting on a process and learning from the consequence. To that end, these methods utilize artificial neural networks, which are synergized for action evaluation and optimal action prediction. This feature is highly desirable for process control, especially when the knowledge about a process is limited or when it is susceptible to uncertainties. In this work, we summarize important concepts of AC methods and survey their process control applications. This treatment is followed by a comparative evaluation of the set-point tracking and robustness of controllers based on five prominent AC methods, namely, DDPG, TD3, SAC, PPO, and TRPO, in five case studies of varying process nonlinearity. The training demands and control performances indicate the superiority of DDPG and TD3 methods, which rely on off-policy, deterministic search for optimal action policies. Overall, the knowledge base and results of this work are expected to serve practitioners in their efforts toward further development of autonomous process control strategies.

show abstract

“…This case study involves a nonlinear process in which a non‐isothermal reaction takes place in a batch reactor with reactant A converting into products (see Figure 8). [ 70,108,109 ]…”

Section: Evaluation Of Ac Methodsmentioning

confidence: 99%

A survey and comparative evaluation of actor‐critic methods in process control

Dutta

Upreti

2022

Can J Chem Eng

View full text Add to dashboard Cite

show abstract

“…where ãφ A = tanh µ φ A (s) + σ θ (s).ξ and ξ ∼ N(0, 1). Recent works in the literature have explored the deployment of an ensemble of actor networks in the actor-critic RL framework [14]. In complex environments where the best strategy cannot be represented by a single network, multiple actor networks can be used as a potential solution to learn the optimal policy.…”

Section: Maximum Entropy Rl and Sacmentioning

confidence: 99%

“…This kind of model-free RL approach addresses the main limitation of model-based control approaches by eliminating the requirement of a high fidelity process model. Even if an approximate process model is available, it can be used in the offline learning stage and data generation [14], thereby significantly reducing the requirement of data and the risk associated with safety. Thus, the burden on the online computation will be relatively less as the policy obtained through offline learning can be used as the warm start during the online implementation.…”

Section: Introductionmentioning

confidence: 99%

“…Zhang et al have proposed a SAC based control algorithm for minimizing the operational costs and ensuring power reliability in an integrated power, heat and natural-gas system [32]. However, the use of the SAC, as a potential RL algorithm, in the process control domain, has not been reported in the literature [33,34,35,14] The aforementioned algorithms updates the policy based on the gradient ascent approach and, therefore, can easily trap in the local optima rather than global optima. Recently, there has been an interest in the research fraternity to employ an ensemble of actor networks for improving the performance of the RL agents [35,36,37,33,14].…”

Section: Introductionmentioning

confidence: 99%

“…However, the use of the SAC, as a potential RL algorithm, in the process control domain, has not been reported in the literature [33,34,35,14] The aforementioned algorithms updates the policy based on the gradient ascent approach and, therefore, can easily trap in the local optima rather than global optima. Recently, there has been an interest in the research fraternity to employ an ensemble of actor networks for improving the performance of the RL agents [35,36,37,33,14]. The idea is that having multiple actors will allow the agent to explore different paths instead of getting restricted to a single path if it would have only one policy.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control

Joshi¹,

Kodamanaa²,

Kandath³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous. It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning. To this end, the current study proposes a stochastic actor-critic RL algorithm, termed Twin Actor Soft Actor-Critic (TASAC), by incorporating an ensemble of actors for learning, in a maximum entropy framework, for batch process control.

show abstract

Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes

Zhu

Pan

2023

Neural Comput & Applic

View full text Add to dashboard Cite

Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control

Cited by 29 publications

References 33 publications

A survey and comparative evaluation of actor‐critic methods in process control

A survey and comparative evaluation of actor‐critic methods in process control

TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control

Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes

Contact Info

Product

Resources

About