Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.472
|View full text |Cite
|
Sign up to set email alerts
|

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

Abstract: Scientific keyphrase identification and classification is the task of detecting and classifying keyphrases from scholarly text with their types from a set of predefined classes. This task has a wide range of benefits, but it is still challenging in performance due to the lack of large amounts of labeled data required for training deep neural models. In order to overcome this challenge, we explore pre-trained language models BERT and SciBERT with intermediate task transfer learning, using 42 data-rich related i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…Ye et al (2021b) propose to dynamically align target phrases to eliminate the influence of order, as highlighted by Meng et al (2021). Mu et al (2020); Liu et al (2020a); Park and Caragea (2020) use pre-trained language models for better representations of documents.…”
Section: Related Workmentioning
confidence: 99%
“…Ye et al (2021b) propose to dynamically align target phrases to eliminate the influence of order, as highlighted by Meng et al (2021). Mu et al (2020); Liu et al (2020a); Park and Caragea (2020) use pre-trained language models for better representations of documents.…”
Section: Related Workmentioning
confidence: 99%
“…It can not only provide users a quick view of result documents (similar to docu-ment summarization) but may also benefit downstream tasks such as document indexing, document recommendation, and query suggestion. Most of them formulated keyphrase extraction as a sequential labeling task (Lim et al, 2020;Wu et al, 2021;Park and Caragea, 2020;Sahrawat et al, 2020;Liu et al, 2020). For example, some work (Sahrawat et al, 2020;Park and Caragea, 2020) adopted contextualized embeddings generated by BERT or SciBERT (Beltagy et al, 2019) as the input of their BiLSTM-CRF architecture for scientific keyphrase extraction.…”
Section: Keyphrase Extractionmentioning
confidence: 99%
“…Most of them formulated keyphrase extraction as a sequential labeling task (Lim et al, 2020;Wu et al, 2021;Park and Caragea, 2020;Sahrawat et al, 2020;Liu et al, 2020). For example, some work (Sahrawat et al, 2020;Park and Caragea, 2020) adopted contextualized embeddings generated by BERT or SciBERT (Beltagy et al, 2019) as the input of their BiLSTM-CRF architecture for scientific keyphrase extraction. Tang et al (2019) used BERT with attention layer to automatically extract keywords from the clinical notes.…”
Section: Keyphrase Extractionmentioning
confidence: 99%
“…(topic t , type c ), typec ∈ {task, algorithm, data}, topict ∈ T, with T representing the list of topics generated by the Latent Dirichlet Allocation model (LDA) (Blei et al, 2003). These topic categories can be useful beyond our field and application, for example in question answering systems or paper recommendation systems (Augenstein et al, 2017;Park and Caragea, 2020;Luan et al, 2018;QasemiZadeh and Schumann, 2016). In order to identify the topics occurring in our corpus of scientific texts, we first train an LDA model on the full texts extracted from computational linguistics articles, and use it to extract a set of 100 topics which we will use to analyze the evolution of the field in the next stages of our study.…”
Section: Representation Of Ideasmentioning
confidence: 99%