“…Value-based algorithms learn the optimal value function and then use it to derive the optimal policy, whereas policy-based algorithms learn the optimal policy directly. Actor-critic methods [69,94] are hybrid approaches that use policy-based methods to improve a policy while also evaluating it by estimating its corresponding value function. Several studies, including [22,95], investigated the adaptability of value-based algorithms to environmental changes.…”