Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.116
|View full text |Cite
|
Sign up to set email alerts
|

Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…This suggests that compared to MRC mechanisms, our prompt-based approach improves the reasoning ability of the bert-based model. Our model's accuracy in fine-grained emotion recognition is not able to match that 2 https://github.com/huggingface/transformers 3 https://github.com/Lightning-AI/lightning of large language models because the large model itself has a good emotion recognition ability (Bubeck et al 2023), and FG-RECCON is marked by ChatGPT. But, we achieve similar performance in identifying causal spans with ChatGPT, which indicates that we obtain the ability to perform causal reasoning through knowledge distillation from the relatively LLaMA model.…”
Section: Resultsmentioning
confidence: 91%
See 1 more Smart Citation
“…This suggests that compared to MRC mechanisms, our prompt-based approach improves the reasoning ability of the bert-based model. Our model's accuracy in fine-grained emotion recognition is not able to match that 2 https://github.com/huggingface/transformers 3 https://github.com/Lightning-AI/lightning of large language models because the large model itself has a good emotion recognition ability (Bubeck et al 2023), and FG-RECCON is marked by ChatGPT. But, we achieve similar performance in identifying causal spans with ChatGPT, which indicates that we obtain the ability to perform causal reasoning through knowledge distillation from the relatively LLaMA model.…”
Section: Resultsmentioning
confidence: 91%
“…The architecture of teacher-student models has also been used as a special form of transfer learning for domain migration (Choi, Choi, and Lee 2022). Recently, Large Language Models (LLMs) have shown excellent performance in generalization across various tasks (Bubeck et al 2023). In order to improve the performance of models in specific domains, many research works have focused on distilling the knowledge of teacher LLMs into student models.…”
Section: Prompt Learning and Knowledge Distillationmentioning
confidence: 99%