Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction

Phi, Van-Thuy; Santoso, Joan; Shimbo, Masashi; Matsumoto, Yūji

doi:10.18653/v1/p18-2015

Cited by 9 publications

(8 citation statements)

References 19 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Zeng et al [15] introduce a path-based neural extraction model to encode the relational semantic information from both direct sentences and inference chains that can be built between two target entities via intermediate entities. Motivated by the hypertext-induced topic search (HITS) [16] algorithm, and selecting cluster centroids method such as K-means, latent semantic analysis (LSA) [17], or nonnegative matrix factorization (NMF) [18], Phi et al [19] formulate wrong label reduction tasks as ranking problems according to different ranking criteria. He et al [20] divide the original classification task into subtasks in different levels and construct a tree-like categorization structure.…”

Section: Related Workmentioning

confidence: 99%

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Chen

Wang

et al. 2021

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

show abstract

Section: Related Workmentioning

confidence: 99%

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Chen

Wang

et al. 2021

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

show abstract

“…Dataset yang digunakan bersumber dari penelitian yang dilakukan oleh [13]. Pada penelitian tersebut, data yang digunakan dihimpun dari berbagai sumber dan diolah sehingga sudah diidentifikasikan konsep dan jenis relasi meronymy terkait.…”

Section: Metode Penelitianunclassified

“…Pada tahapan ini, data training yang digunakan murni berasal dari penelitian sebelumnya [13]. Sesuai dengan relasi yang didefinisikan pada [9], didapatkan semua kata atau frasa yang mengindikasikan adanya sebuah relasi antar entitas di dalam kalimat, beserta dengan jenis relasi yang diindikasikan tersebut.…”

Section: A Ekstraksi Polaunclassified

See 1 more Smart Citation

Ekstraksi Relasi Meronymy dengan Lexico-Syntactic Patterns

Kardinata

Rakhmawati

2020

JEPIN

View full text Add to dashboard Cite

Ontologi terdiri atas konsep dan relasi yang masing-masing dapat diekstrak dengan berbagai macam metode. Salah satu metode yang dapat digunakan untuk ekstraksi relasi adalah metode berdasarkan Lexico-Syntactic Patterns. Secara sederhana, ekstraksi relasi dilakukan dengan mendapatkan sebuah pola yang menunjukkan sebuah relasi. Kemudian dilakukan percobaan untuk menguji apakah pola yang didapatkan mampu memprediksi relasi dengan tepat. Pada penelitian ini dilakukan percobaan untuk menguji pola relasi meronymy yang didapatkan dari dataset penelitian terdahulu. Evaluasi dilakukan dengan menggunakan nilai recall dan precision. Dari penelitian ini, ditemukan bahwa banyaknya (keragaman) variasi dalam sekumpulan pola yang menunjukkan suatu relasi dapat mempengaruhi kemampuan kumpulan pola tersebut untuk memprediksi relasi dengan tepat. Semakin banyak variasi pola dalam satu relasi, maka ketepatan prediksi cenderung menurun.

show abstract

“…Generally, a bootstrapping approach starts with a classifier trained with the initial corpus which is manually annotated (also called seed corpus) and then gradually improves the accuracy of the classifier through several re-training processes. Thus it is important to obtain the high quality of the initial corpus because the performance of the bootstrapping approach seriously depends on the training data for the initial classifier [27], [28]. However, it is sometimes hard to manually annotate sufficient amounts of the initial corpus.…”

Section: A How To Automatically Generate An Initial Corpusmentioning

confidence: 99%

A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains

Kim

Seo

2019

IEEE Access

View full text Add to dashboard Cite

Biomedical named entity recognition (biomedical NER) is a core component to build biomedical text processing systems, such as biomedical information retrieval and question answering systems. Recently, many studies based on machine learning have been developed for a biomedical NER. The machine learning-based approaches generally require significant amounts of annotated corpora to achieve high performance. However, it is expensive to manually create a large number of high-quality corpora due to the demand for biomedical experts. In addition, most existing corpora have focused on several specific sub-domains, such as disease, protein, and species. It is difficult for a biomedical NER system trained with these corpora to provide much information for biomedical text processing systems. In this paper, we propose a method for automatically generating the machine-labeled biomedical NER corpus that covers various sub-domains by using proper categories from the semantic groups of a unified medical language system (UMLS). We use a bootstrapping approach with a small amount of manually annotated corpus to automatically generate a significant amount of corpus and then construct a biomedical NER system trained with the machine-labeled corpus. At last, we train two machine learning-based classifiers, conditional random fields (CRFs) and long short-term memory (LSTM), with the machine-labeled data to improve performance. The experimental results show that the proposed method is effective to improve performance. As a result, the proposed one obtains higher performance in 23.69% than the model that trained only a small amount of manually annotated corpus in F1-score.

show abstract

Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction

Cited by 9 publications

References 19 publications

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Ekstraksi Relasi Meronymy dengan Lexico-Syntactic Patterns

A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains

Contact Info

Product

Resources

About