Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks

Zixuan, Ke,; Xu, Hongwu; Liu, Bing

doi:10.18653/v1/2021.naacl-main.378

Cited by 53 publications

(60 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, after three epochs RULEBERT is also at 0.0, 3 i.e., it started to unlearn what it had learned at pre-fine-tuning (Kirkpatrick et al, 2017;Kemker et al, 2018;Biesialska et al, 2020). Learning a new task often leads to such catastrophic forgetting (Ke et al, 2021). While there are ways to alleviate this (Ke et al, 2021), this is beyond the scope of this paper.…”

Section: Resultsmentioning

confidence: 96%

See 1 more Smart Citation

RuleBert: Teaching Soft Rules to Pre-trained Language Models

Saeed¹,

Ahmadi²,

Nakov³

et al. 2021

Preprint

View full text Add to dashboard Cite

While pre-trained language models (PLMs) are the go-to solution to tackle many natural language processing problems, they are still very limited in their ability to capture and to use common-sense knowledge. In fact, even if information is available in the form of approximate (soft) logical rules, it is not clear how to transfer it to a PLM in order to improve its performance for deductive reasoning tasks. Here, we aim to bridge this gap by teaching PLMs how to reason with soft Horn rules. We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis. We release the first dataset for this task, and we propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task. Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training. Moreover, we demonstrate that logical notions expressed by the rules are transferred to the finetuned model, yielding state-of-the-art results on external datasets.

show abstract

Section: Resultsmentioning

confidence: 96%

“…Learning a new task often leads to such catastrophic forgetting (Ke et al, 2021). While there are ways to alleviate this (Ke et al, 2021), this is beyond the scope of this paper.…”

Section: Resultsmentioning

confidence: 98%

RuleBert: Teaching Soft Rules to Pre-trained Language Models

Saeed¹,

Ahmadi²,

Nakov³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Existing CL systems SRK (Lv et al, 2019) and KAN (Ke et al, 2020b) are for DSC in the TIL setting, not for ASC. B-CL (Ke et al, 2021) is the first CL system for ASC. It also uses the idea of Adapter-BERT in (Houlsby et al, 2019) and is based on Capsule Network.…”

Section: Related Workmentioning

confidence: 99%

“…After learning a task, its training data is often discarded (Chen and Liu, 2018). The CL setting is useful when the data privacy is a concern, i.e., the data owners do not want their data used by others (Ke et al, 2020b;Qin et al, 2020;Ke et al, 2021). In such cases, if we want to leverage the knowledge learned in the past to improve the new task learning, CL is appropriate as it shares only the learned model, but not the data.…”

Section: Introductionmentioning

confidence: 99%

CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Zixuan¹,

Liu²,

Xu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper studies continual learning (CL) of a sequence of aspect sentiment classification (ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not been studied before for ASC. This paper proposes a novel model called CLASSIC. The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing. Experimental results show the high effectiveness of CLASSIC. 1 * Work was done prior to joining Amazon.

show abstract

“…Another observation about the current CL research is that most techniques do not use pre-trained models. But such pre-trained models or feature extractors can significantly improve the CL performance [18,24]. An important question is how to make the best use of pre-trained models in CL.…”

Section: Introductionmentioning

confidence: 99%

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Zixuan

Liu

Ma³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR. 2 * Work was done prior to joining Amazon. 2 The code of CTR can be found at https://github.com/ZixuanKe/PyContinual 35th Conference on Neural Information Processing Systems (NeurIPS 2021),

show abstract

Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks

Cited by 53 publications

References 31 publications

RuleBert: Teaching Soft Rules to Pre-trained Language Models

RuleBert: Teaching Soft Rules to Pre-trained Language Models

CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Contact Info

Product

Resources

About