“…In order to take the fourth expected contribution into account, we customize a MADRL algorithm, namely WQMIX (Rashid, Farquhar, Peng, & Whiteson, 2020), in the case where system states can be fully observed to obtain cost-effective policies. Indeed, the algorithm takes advantage of the branching dueling network architecture (Tavakoli, Pardo, & Kormushev, 2018) to allow achieving linear increase in the size of the output layer of deep Q-networks when the number of system components grows and the monotonic decomposition scheme for joint action-value functions (Rashid et al, 2020) to enable maintenance decision-making consistency at component and system level.…”