Dynamic Entity Representations in Neural Language Models

Ji, Yangfeng; Tan, Chenhao; Martschat, Sebastian; Choi, Yejin; Smith, Noah A.

doi:10.18653/v1/d17-1195

Cited by 84 publications

(108 citation statements)

References 27 publications

Supporting

Mentioning

108

Contrasting

Order By: Relevance

“…State-of-the-art approaches on these tasks are inherently entity-centric. Separately, it has been shown that entity-centric language modeling in a continuous framework can lead to better performance for LM related tasks Ji et al, 2017). Moreover, external data has shown to be useful for modeling process understanding tasks in prior work , suggesting that pre-trained models may be effective.…”

Section: Background: Process Understandingmentioning

confidence: 99%

Effective Use of Transformer Networks for Entity Tracking

Gupta

Durrett

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Tracking entities in procedural language requires understanding the transformations arising from actions on entities as well as those entities' interactions. While self-attention-based pre-trained language encoders like GPT and BERT have been successfully applied across a range of natural language understanding tasks, their ability to handle the nuances of procedural texts is still untested. In this paper, we explore the use of pre-trained transformer networks for entity tracking tasks in procedural text. First, we test standard lightweight approaches for prediction with pre-trained transformers, and find that these approaches underperform even simple baselines. We show that much stronger results can be attained by restructuring the input to guide the transformer model to focus on a particular entity. Second, we assess the degree to which transformer networks capture the process dynamics, investigating such factors as merged entities and oblique entity references. On two different tasks, ingredient detection in recipes and QA over scientific processes, we achieve state-ofthe-art results, but our models still largely attend to shallow context clues and do not form complex representations of intermediate entity or process state. 1

show abstract

Section: Background: Process Understandingmentioning

confidence: 99%

Effective Use of Transformer Networks for Entity Tracking

Gupta

Durrett

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…Perplexity We evaluate our model using the standard perplexity metric: exp 1 T T t=1 log p(x t ) . However, perplexity suffers from the issue that it PPL UPP ENTITYNLM * (Ji et al, 2017) 85.4 189.2 EntityCopyNet * 76.1 144.0 AWD-LSTM (Merity et al, 2018) overestimates the probability of out-of-vocabulary tokens when they are mapped to a single UNK token. This is problematic for comparing the performance of the KGLM to traditional language models on Linked WikiText-2 since there are a large number of rare entities whose alias tokens are outof-vocabulary.…”

Section: Resultsmentioning

confidence: 99%

“…• AWD-LSTM (Merity et al, 2018): strong LSTM-based model used as the foundation of most state-of-the-art models on WikiText-2. • ENTITYNLM (Ji et al, 2017): an LSTM-based language model with the ability to track entity mentions. Embeddings for entities are created dynamically, and are not informed by any external sources of information.…”

Section: Evaluation Setupmentioning

confidence: 99%

Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling

Logan¹,

Liu²,

Peters³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

127

View full text Add to dashboard Cite

Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context. These mechanisms enable the model to render information it has never seen before, as well as generate out-of-vocabulary tokens. We also introduce the Linked WikiText-2 dataset, 1 a corpus of annotated text aligned to the Wikidata knowledge graph whose contents (roughly) match the popular WikiText-2 benchmark (Merity et al., 2017). In experiments, we demonstrate that the KGLM achieves significantly better performance than a strong baseline language model. We additionally compare different language models' ability to complete sentences requiring factual knowledge, and show that the KGLM outperforms even very large language models in generating facts.

show abstract

“…Recently, entity tracking has been popular for generating coherent text (Kiddon et al, 2016;Ji et al, 2017;Clark et al, 2018). Kiddon et al (2016) proposed a neural checklist model that updates predefined item states.…”

Section: Memory Modulesmentioning

confidence: 99%

“…Kiddon et al (2016) proposed a neural checklist model that updates predefined item states. Ji et al (2017) proposed an entity representation for the language model. Updating entity tracking states when the entity is introduced, their method selects the salient entity state.…”

Section: Memory Modulesmentioning

confidence: 99%

Learning to Select, Track, and Generate for Data-to-Text

Iso¹,

Uehara²,

Ishigaki³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generation. Experimental results show that our model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

show abstract

Dynamic Entity Representations in Neural Language Models

Cited by 84 publications

References 27 publications

Effective Use of Transformer Networks for Entity Tracking

Effective Use of Transformer Networks for Entity Tracking

Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling

Learning to Select, Track, and Generate for Data-to-Text

Contact Info

Product

Resources

About