“…One commonly used technique for solving such partially observable problems is to model the dynamics of the environments firstly, for example, the Partially Observable Markov Decision Processes (POMDP) (Kaelbling, Littman, & Cassandra, 1998;Ross, Pineau, Paquet, & Chaib-Draa, 2008) and Predictive State Representations (PSRs) (Littman, Sutton, & Singh, 2001;Liu, Tang, & Zeng, 2015;Liu, Zhu, Zeng, & Dai, 2016;Talvitie & Singh, 2011) approach, and then the problem can be solved using the obtained model. Although POMDPs and PSRs provide general frameworks to solve partially observable problems, they rely heavily on a known and accurate model of the environment (Liu, Yang, & Ji, 2014;Spaan & Vlassis, 2005;Pineau, Gordon, & Thrun, 2006;Ye, Somani, Hsu, & Lee, 2017). However, in real-world applications it is extremely difficult to build an accurate model.…”