Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2015
|View full text |Cite
|
Sign up to set email alerts
|

Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction

Abstract: This paper addresses the tasks of automatic seed selection for bootstrapping relation extraction, and noise reduction for distantly supervised relation extraction. We first point out that these tasks are related. Then, inspired by ranking relation instances and patterns computed by the HITS algorithm, and selecting cluster centroids using the K-means, LSA, or NMF method, we propose methods for selecting the initial seeds from an existing resource, or reducing the level of noise in the distantly labeled data. E… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
3

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 19 publications
(13 reference statements)
0
5
0
3
Order By: Relevance
“…Zeng et al [15] introduce a path-based neural extraction model to encode the relational semantic information from both direct sentences and inference chains that can be built between two target entities via intermediate entities. Motivated by the hypertext-induced topic search (HITS) [16] algorithm, and selecting cluster centroids method such as K-means, latent semantic analysis (LSA) [17], or nonnegative matrix factorization (NMF) [18], Phi et al [19] formulate wrong label reduction tasks as ranking problems according to different ranking criteria. He et al [20] divide the original classification task into subtasks in different levels and construct a tree-like categorization structure.…”
Section: Related Workmentioning
confidence: 99%
“…Zeng et al [15] introduce a path-based neural extraction model to encode the relational semantic information from both direct sentences and inference chains that can be built between two target entities via intermediate entities. Motivated by the hypertext-induced topic search (HITS) [16] algorithm, and selecting cluster centroids method such as K-means, latent semantic analysis (LSA) [17], or nonnegative matrix factorization (NMF) [18], Phi et al [19] formulate wrong label reduction tasks as ranking problems according to different ranking criteria. He et al [20] divide the original classification task into subtasks in different levels and construct a tree-like categorization structure.…”
Section: Related Workmentioning
confidence: 99%
“…Dataset yang digunakan bersumber dari penelitian yang dilakukan oleh [13]. Pada penelitian tersebut, data yang digunakan dihimpun dari berbagai sumber dan diolah sehingga sudah diidentifikasikan konsep dan jenis relasi meronymy terkait.…”
Section: Metode Penelitianunclassified
“…Pada tahapan ini, data training yang digunakan murni berasal dari penelitian sebelumnya [13]. Sesuai dengan relasi yang didefinisikan pada [9], didapatkan semua kata atau frasa yang mengindikasikan adanya sebuah relasi antar entitas di dalam kalimat, beserta dengan jenis relasi yang diindikasikan tersebut.…”
Section: A Ekstraksi Polaunclassified
See 1 more Smart Citation
“…Generally, a bootstrapping approach starts with a classifier trained with the initial corpus which is manually annotated (also called seed corpus) and then gradually improves the accuracy of the classifier through several re-training processes. Thus it is important to obtain the high quality of the initial corpus because the performance of the bootstrapping approach seriously depends on the training data for the initial classifier [27], [28]. However, it is sometimes hard to manually annotate sufficient amounts of the initial corpus.…”
Section: A How To Automatically Generate An Initial Corpusmentioning
confidence: 99%