Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize ingame rewards. We evaluate our approach using the Jericho benchmark (Hausknecht et al., 2019a), on games unseen by CALM during training. Our method obtains a 69% relative improvement in average game score over the previous state-of-the-art model. Surprisingly, on half of these games, CALM is competitive with or better than other models that have access to ground truth admissible actions. * * Code and data are available at https://github. com/princeton-nlp/calm-textgame.Observation: You are in the living room. There is a doorway to the east, a wooden door with strange gothic lettering to the west, which appears to be nailed shut, a trophy case, and a large oriental rug in the center of the room. You are carrying: A brass lantern . . . Random Actions: close door, north a, eat troll with egg, . . . CALM (n-gram) Actions: enter room, leave room, lock room, open door, close door, knock on door, . . . CALM (GPT-2) Actions: east, open case, get rug, turn on lantern, move rug, unlock case with key, . . . Next Observation: With a great effort, the rug is moved to one side of the room, revealing the dusty cover of a closed trap door...Figure 1: Sample gameplay from ZORK1 along with action sets generated by two variants of CALM. The game recognizes a vocabulary size of 697, resulting in more than 697 4 ≈ 200 billion potential 4-word actions.'move rug' is the optimal action to take here and is generated by our method as a candidate.
No abstract
Multi-scale features fusion plays a critical role in salient object detection. Most of existing methods have achieved remarkable performance by exploiting various multi-scale features fusion strategies. However, an elegant fusion framework requires expert knowledge and experience, heavily relying on laborious trial and error. In this paper, we propose a multi-scale features fusion framework based on Neural Architecture Search (NAS), named Auto-MSFNet. First, we design a novel search cell, named FusionCell to automatically decide multi-scale features aggregation. Rather than searching one repeatable cell stacked, we allow different FusionCells to flexibly integrate multi-level features. Simultaneously, considering features generated from CNNs are naturally spatial and channel-wise, we propose a new search space for efficiently focusing on the most relevant information. The search space mitigates incomplete object structures or over-predicted foreground regions caused by progressive fusion. Second, we propose a progressive polishing loss to further obtain exquisite boundaries by penalizing misalignment of salient object boundaries. Extensive experiments on five benchmark datasets demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance on four evaluation metrics. The code and results of our method are available at https://github.com/OIPLab-DUT/Auto-MSFNet. CCS CONCEPTS• Computing methodologies → Interest point and salient region detections.
Text-based games simulate worlds and interact with players using natural language. Recent work has used them as a testbed for autonomous language-understanding agents, with the motivation being that understanding the meanings of words or semantics is a key component of how humans understand, reason, and act in these worlds. However, it remains unclear to what extent artificial agents utilize semantic understanding of the text. To this end, we perform experiments to systematically reduce the amount of semantic information available to a learning agent. Surprisingly, we find that an agent is capable of achieving high scores even in the complete absence of language semantics, indicating that the currently popular experimental setup and models may be poorly designed to understand and leverage game texts. To remedy this deficiency, we propose an inverse dynamics decoder to regularize the representation space and encourage exploration, which shows improved performance on several games including ZORK I. We discuss the implications of our findings for designing future agents with stronger semantic understanding. * Work partly done during internship at Microsoft Research. Project page: https://blindfolded.cs. princeton.edu.(a) ZORK I Observation 21: You are in the living room. There is a doorway to the east, a wooden door with strange gothic lettering to the west, which appears to be nailed shut, a trophy case, and a large oriental rug in the center of the room. You are carrying: A brass lantern . . . Action 21: move rugObservation 22: With a great effort, the rug is moved to one side of the room, revealing the dusty cover of a closed trap door... Living room... You are carrying: ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.