Proceedings of the Graph-Based Methods for Natural Language Processing (TextGraphs) 2020
DOI: 10.18653/v1/2020.textgraphs-1.5
|View full text |Cite
|
Sign up to set email alerts
|

Contextual BERT: Conditioning the Language Model Using a Global State

Abstract: BERT is a popular language model whose main pre-training task is to fill in the blank, i.e., predicting a word that was masked out of a sentence, based on the remaining words. In some applications, however, having an additional context can help the model make the right prediction, e.g., by taking the domain or the time of writing into account. This motivates us to advance the BERT architecture by adding a global state for conditioning on a fixed-sized context. We present our two novel approaches and apply them… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…This restricts the applicability of trained models to a real-world scenario as the model's behavior and predictions become very specific to the type of data they are trained on. To conquer this, many studies have recently evolved that focus on building models that can incorporate world knowledge for enhanced modeling and inference on the task at hand, such as He et al (2020) for NER, Denk and Peleteiro Ramallo (2020) for Representation Learning and Kim et al (2015) for Dependency Parsing, etc. Although recent works in literature have successfully incorporated world knowledge for Sequence Labeling (He et al 2020), they come with certain limitations, which we discuss ahead. First, as words in a language can be polysemous (Lin et al 2002), entities and relations in a knowledge graph can be polysemous too (Xiao, Huang, and Zhu 2016).…”
Section: Introductionmentioning
confidence: 99%
“…This restricts the applicability of trained models to a real-world scenario as the model's behavior and predictions become very specific to the type of data they are trained on. To conquer this, many studies have recently evolved that focus on building models that can incorporate world knowledge for enhanced modeling and inference on the task at hand, such as He et al (2020) for NER, Denk and Peleteiro Ramallo (2020) for Representation Learning and Kim et al (2015) for Dependency Parsing, etc. Although recent works in literature have successfully incorporated world knowledge for Sequence Labeling (He et al 2020), they come with certain limitations, which we discuss ahead. First, as words in a language can be polysemous (Lin et al 2002), entities and relations in a knowledge graph can be polysemous too (Xiao, Huang, and Zhu 2016).…”
Section: Introductionmentioning
confidence: 99%