“…Exploration in RL: Exploration is one of the most important issues in model-free RL, as there is the key assumption that all state-action pairs must be visited infinitely often to guarantee the convergence of Q-function [56]. In order to explore diverse state-action pairs in the joint state-action space, various methods have been considered in prior works: intrinsically-motivated reward based on curiosity [5,11], model prediction error [1,10], information gain [26,28,29], and counting states [33,35]. These exploration techniques improve exploration and performance in challenging sparse-reward environments [3,10,13].…”