“…As a result, the learning performance would depend on the effective size of the state space (or the effective number of unknown parameters). Various notions of structures have been studied in MDPs, which include the Lipschitz continuity of MDP parameters (e.g., rewards and transition functions) [ 10 , 11 , 12 , 13 ], factorization structure [ 14 , 15 , 16 ], and equivalence relations [ 17 , 18 , 19 , 20 , 21 , 22 ]. These works reveal that exploiting the underlying structure in the environment in various RL tasks leads to massive empirical performance gain (over structure-oblivious algorithms) and to significantly improved performance bounds.…”