2022
DOI: 10.23919/csms.2022.0002
|View full text |Cite
|
Sign up to set email alerts
|

Q-Learning-Based Teaching-Learning Optimization for Distributed Two-Stage Hybrid Flow Shop Scheduling with Fuzzy Processing Time

Abstract: Two-stage hybrid flow shop scheduling has been extensively considered in single-factory settings.However, the distributed two-stage hybrid flow shop scheduling problem (DTHFSP) with fuzzy processing time is seldom investigated in multiple factories. Furthermore, the integration of reinforcement learning and metaheuristic is seldom applied to solve DTHFSP. In the current study, DTHFSP with fuzzy processing time was investigated, and a novel Q-learning-based teaching-learning based optimization (QTLBO) was const… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(13 citation statements)
references
References 48 publications
0
13
0
Order By: Relevance
“…As previously stated, the Q-learning agent is the BS, whose aim is to boost the accumulative transmission sum rate. Therefore, two value functions can be inspected while considering the RL maximization problem [ 34 , 36 , 42 ], the first one is the state value function and the other one is the state-action value function where denotes the expected value given that the agent follows a certain policy within the applied procedure. Due to unspecified transition probabilities and limited observed states, an optimal policy is difficult to achieve.…”
Section: Channel Estimation Based Q-learning Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…As previously stated, the Q-learning agent is the BS, whose aim is to boost the accumulative transmission sum rate. Therefore, two value functions can be inspected while considering the RL maximization problem [ 34 , 36 , 42 ], the first one is the state value function and the other one is the state-action value function where denotes the expected value given that the agent follows a certain policy within the applied procedure. Due to unspecified transition probabilities and limited observed states, an optimal policy is difficult to achieve.…”
Section: Channel Estimation Based Q-learning Algorithmmentioning
confidence: 99%
“…Therefore, the Q -learning procedure is developed to approximately achieve the best possible policy. In the developed Q -learning procedure, the state-action value function values are learned via trial and error and are updated according to the following formula [ 15 , 34 , 36 , 42 ]: where is the learning rate, denotes the new state, and is the new action that will be considered by the agent from the action space to maximize the new state-action value function .…”
Section: Channel Estimation Based Q-learning Algorithmmentioning
confidence: 99%
“…The multi-objective optimization problems (MOPs) have been the focus of academic and engineering fields. Many real-world problems are MOPs, such as big data [1,2], image [3,4], feature selection [5,6], community detection [7], engineering design [8,9], shop floor scheduling [10,11], and medical services [12]. Usually, the objectives in these problems are conflicting and mutually constrained, and the improvement of one objective may lead to the deterioration of another one.…”
Section: Introductionmentioning
confidence: 99%
“…It can make good decisions in the face of high-dimensional, complex environments, and has been applied to load forecasting [8] and multi-intelligence body reinforcement learning [9]. Xi and Lei [10] presented the QTLBO and OTLBO algorithms, which integrate Q-learning and metaheuristics, to proficiently address the challenges of solving the distributed two-stage hybrid flow shop scheduling problem characterized by fuzzy processing times. Kushwaha et al [11] introduced a pioneering approach grounded in Q-learning, enabling intelligent wind speed sensor-less maximum power point tracking.…”
Section: Introductionmentioning
confidence: 99%