Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.259
|View full text |Cite
|
Sign up to set email alerts
|

Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation

Abstract: Large pre-trained models such as BERT are known to improve different downstream NLP tasks, even when such a model is trained on a generic domain. Moreover, recent studies have shown that when large domain-specific corpora are available, continued pre-training on domain-specific data can further improve the performance of in-domain tasks. However, this practice requires significant domainspecific data and computational resources which may not always be available. In this paper, we aim to adapt a generic pretrai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 36 publications
0
10
0
Order By: Relevance
“…Recently, Domain Adaptation (DA) has gained popularity in the machine learning field due to its demonstrated effectiveness in enhancing model prediction performance, especially when dealing with unlabeled test samples. This applicability extends to domains like computer vision [36,54,56], as well as natural language processing [10,29,50]. As shown in Fig.…”
Section: (B)mentioning
confidence: 94%
“…Recently, Domain Adaptation (DA) has gained popularity in the machine learning field due to its demonstrated effectiveness in enhancing model prediction performance, especially when dealing with unlabeled test samples. This applicability extends to domains like computer vision [36,54,56], as well as natural language processing [10,29,50]. As shown in Fig.…”
Section: (B)mentioning
confidence: 94%
“…According to the nature of prompt tokens, existing prompt tuning methods can be categorized into two types based on their nature: 1) discrete prompts (Wallace et al, 2019;Shin et al, 2020;Yuan et al, 2021;Haviv et al, 2021;Gao et al, 2021;Ben-David et al, 2021;Davison et al, 2019;Su et al, 2022;Diao et al, 2022) and continuous prompts (Zhong et al, 2021;Qin and Eisner, 2021;Hambardzumyan et al, 2021;Han et al, 2021;Li and Liang, 2021). Discrete prompts optimize a sequence of discrete tokens, while continuous prompts optimize a sequence of vectors similar to adapter-based tuning (Houlsby et al, 2019;Pfeiffer et al, 2020;Diao et al, 2020Diao et al, , 2021. Our research is highly relevant to exemplar-based in-context learning and discrete prompts research.…”
Section: Prompt-based Learningmentioning
confidence: 98%
“…Knowledge can be injected into PLMs by pre-training or fine-tuning, each corresponding to a separate research direction. During pre-training, the knowledge carried by knowledge graphs [49,53,54], entities [49,55], n-grams [56], knowledge embedding [57], synonym and hyponym-hypernym relations in WordNet [19], word-supersense knowledge [58], and knowledge bases [59][60][61] can be injected into PLMs by feeding knowledge inputs and designing new objectives. Furthermore, in terms of event pretraining, Majewska et al investigated the potential of leveraging knowledge about semantic-syntactic behaviour of verbs to improve the capacity of large pretrained models to reason about events in diverse languages [28].…”
Section: Injecting Knowledge Into Lmsmentioning
confidence: 99%