2022
DOI: 10.1002/aic.17658
|View full text |Cite
|
Sign up to set email alerts
|

Integration of reinforcement learning and model predictive control to optimize semi‐batch bioreactor

Abstract: As the digital transformation of the bioprocess is progressing, several studies propose to apply data‐based methods to obtain a substrate feeding strategy that minimizes the operating cost of a semi‐batch bioreactor. However, the negligent application of model‐free reinforcement learning (RL) has a high chance to fail on improving the existing control policy because the available amount of data is limited. In this article, we propose an integrated algorithm of double‐deep Q‐network and model predictive control… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(8 citation statements)
references
References 52 publications
0
6
0
Order By: Relevance
“…The actor networks in SPER-SAC are parameterized DNNs consisting of 4 hidden layers with 256 hidden nodes, therefore the number of nodes in each layer as [12,256,256,2], whose inputs are the vector of process variables listed in table 1 and outputs the set value of the substrate feed flowrate in the form of a Gaussian distribution. In contrast, the number of network nodes in each layer of the critic network is [13,256,256,1], and its input consists of the process variables and the output of the actor network.…”
Section: Training Environmentmentioning
confidence: 99%
See 1 more Smart Citation
“…The actor networks in SPER-SAC are parameterized DNNs consisting of 4 hidden layers with 256 hidden nodes, therefore the number of nodes in each layer as [12,256,256,2], whose inputs are the vector of process variables listed in table 1 and outputs the set value of the substrate feed flowrate in the form of a Gaussian distribution. In contrast, the number of network nodes in each layer of the critic network is [13,256,256,1], and its input consists of the process variables and the output of the actor network.…”
Section: Training Environmentmentioning
confidence: 99%
“…To address these challenges, reinforcement learning (RL) has been shown to be a potential alternative to the traditional control methods [10][11][12]. First, as a data-driven approach, the agent (analogous to the controller) in RL learns the control action (analogous to the control output) by interacting with the environment directly (analogous to the system/process) at each time step [13].…”
Section: Introductionmentioning
confidence: 99%
“…Model-free reinforcement learning, on the other hand, obtains optimal strategies through real-time interaction between the agent and the environment without requiring precision. For example, Lee et al integrated MPC with the double-deep Q-network algorithm to obtain the optimal substrate feeding strategy in industrial-scale penicillin production, effectively reducing the operating cost of semi-intermittent bioreactors [ 104 ]. Benton et al optimized the feeding process for cyanobacterial-phycocyanin (C-PC) production by the Asynchronous Advantage Actor-Critic (A3C) algorithm with asynchronous learning control and finally increased the product yield by 52.1% [ 105 ].…”
Section: Development Of Modelingmentioning
confidence: 99%
“…Deep neural networks (DNNs) have been immensely successful for building autonomous systems for forecasting [1], monitoring [2], fault detection [3] and control [4] to name a few. The performance of DNNs is heavily reliant on the fidelity and quantity of data used for training.…”
Section: Introductionmentioning
confidence: 99%