Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1134
|View full text |Cite
|
Sign up to set email alerts
|

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Abstract: Distantly supervised relation extraction is widely used to extract relational facts from text, but suffers from noisy labels. Current relation extraction methods try to alleviate the noise by multi-instance learning and by providing supporting linguistic and contextual information to more efficiently guide the relation classification. While achieving state-of-the-art results, we observed these models to be biased towards recognizing a limited set of relations with high precision, while ignoring those in the lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
81
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 109 publications
(91 citation statements)
references
References 33 publications
1
81
0
Order By: Relevance
“…Second, we investigated the necessary number of reports needed for accurate classification for our pathology reports by varying the size of the training set of reports from 16 to 256 across both classification and extraction. While others have performed sample efficiency analysis of NLP algorithms across many tasks, 28–30 to our knowledge, this has not been investigated for the important application of clinical information extraction from pathology reports, with the exception of Yala et al who plot dataset size vs performance over only one method (boosting) and over fields that only take 2 values. 6 Overall, we found that only 128 labeled reports were needed for the best methods for classification and only 64 for the token extractor, a small number compared to the dataset sizes used in prior work.…”
Section: Discussionmentioning
confidence: 99%
“…Second, we investigated the necessary number of reports needed for accurate classification for our pathology reports by varying the size of the training set of reports from 16 to 256 across both classification and extraction. While others have performed sample efficiency analysis of NLP algorithms across many tasks, 28–30 to our knowledge, this has not been investigated for the important application of clinical information extraction from pathology reports, with the exception of Yala et al who plot dataset size vs performance over only one method (boosting) and over fields that only take 2 values. 6 Overall, we found that only 128 labeled reports were needed for the best methods for classification and only 64 for the token extractor, a small number compared to the dataset sizes used in prior work.…”
Section: Discussionmentioning
confidence: 99%
“…It has also been proved that the transformer used in GPT and BERT has better ability on extracting features than the LSTM. Alt, Hübner, and Hennig (2019) have utilized a transformer decoder in GPT for distantly supervised relation extraction and obtained state-of-the-art results.…”
Section: Related Workmentioning
confidence: 99%
“…However, none of the above methods pay attention to the biased and inaccurate test set. Though human evaluation can yield accurate evaluation results (Zeng et al, 2015;Alt et al, 2019), labeling all the instances in the test set is too costly.…”
Section: Related Workmentioning
confidence: 99%