“…Narasimhan et al (2015) demonstrate that "Deep-Q Networks" (DQN) (Mnih et al, 2015) (Fulda et al, 2017a) 0.59 0.03 0.00 0.10 0.00 0.01 Golovin (Kostka et al, 2017) 0.20 0.04 0.10 0.15 0.00 0.01 AE-DQN (Zahavy et al, 2018) -0.05 ----NeuroAgent (Rajalingam and Samothrakis, 2019) 0.19 0.03 0.00 0.20 0.00 0.00 NAIL (Hausknecht et al, 2019) 0.38 0.03 0.26 -0.00 0.00 CNN-DQN (Yin and May, 2019a) -0.11 ----IK-OMP (Tessler et al, 2019) -1.00 ----TDQN 0.47 0.03 0.00 0.34 0.02 0.00 KG-A2C 0.58 0.10 0.01 0.06 0.03 0.01 SC (Jain et al, 2020) -0.10 --0.0 -CALM (N-gram) (Yao et al, 2020) 0.79 0.07 0.00 0.09 0.00 0.00 CALM (GPT-2) (Yao et al, 2020) 0.80 0.09 0.07 0.14 0.05 0.01 RC-DQN (Guo et al, 2020a) 0.81 0.11 0.40 0.20 0.05 0.02 MPRC-DQN (Guo et al, 2020a) 0.88 0.11 0.52 0.20 0.05 0.02 SHA-KG (Xu et al, 2020) 0.86 0.10 0.10 -0.05 0.02 MC!Q*BERT (Ammanabrolu et al, 2020b) 0.92 0.12 --0.00 -INV-DY (Yao et al, 2021) 0.81 0.12 0.06 0.11 0.05 - To support these modelling paradigms, Zelinka et al (2019) introduce TextWorld KG, a dataset for learning the subtask of updating knowledge graphs based on text world descriptions in a cooking domain, and show their best ensemble model is able to achieve 70 F1 at this subtask. Similarly, Annamabrolu et al (2021a) introduce JerichoWorld, a similar dataset for world modeling using knowledge graphs but on a broader set of interactive fiction games, and subsequently introduce World-Former (Ammanabrolu and Riedl, 2021b), a multitask transformer model that performs well at both knowledge-graph prediction and next-action prediction tasks. Question Answering: Agents can reframe Text World tasks as question answering tasks to gain relevant knowledge for action selection, with these agents providing current state-of-the-art performance across a variety of benchmarks.…”