Multiˆ2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT

Ro, Youngbin; Lee, Yukyung; Kang, Pilsung

doi:10.18653/v1/2020.findings-emnlp.99

Cited by 31 publications

(56 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The line of work on OIE starts with systems relying on distant supervision [11,12], and rule-based paradigms that focus on the grammatical and syntactic properties of the language [13,14]. An abundance of learning-based systems that leverage annotated data sources to train classifiers has been proposed [15,16], with more recent implementations making use of pretrained language models [17,18]. Despite the existence of so many approaches, however, the majority focus only on evaluating the effectiveness of different triple extraction tools on raw data, without incorporating any preprocessing strategies to limit the number of potentially uninformative triples [19].…”

Section: Information Extractionmentioning

confidence: 99%

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Smith

Papadopoulos

Braschler

et al. 2022

Information Systems

View full text Add to dashboard Cite

Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with ''clean'', structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of ''triples'' needs to be both (1) of high quality and ( 2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground.The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction.We have made our code publicly available 1 to facilitate further research.

show abstract

Section: Information Extractionmentioning

confidence: 99%

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Smith

Papadopoulos

Braschler

et al. 2022

Information Systems

View full text Add to dashboard Cite

show abstract

“…Many open information extraction (OIE) systems, e.g., Stanford OpenIE (Angeli et al, 2015), OLLIE (Schmitz et al, 2012), Reverb (Fader et al, 2011), and their descendant Open IE4 leverage carefully-designed linguistic patterns (e.g., based on dependencies and POS tags) to extract triples from textual corpora without using additional training sets. Recently, supervised OIE systems (Stanovsky et al, 2018;Ro et al, 2020;Kolluru et al, 2020) formulate the OIE as a sequence generation problem using neural networks trained on additional training sets. Similar to our work, Wang et al (2020) use the parameters of LMs to extract triples, with the main difference that DEEPEX not only improves the recall of the beam search, but also uses a pre-trained ranking model to enhance the zero-shot capability.…”

Section: Related Workmentioning

confidence: 99%

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

Wang¹,

Liu²,

Chen³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We cast a suite of information extraction tasks into a text-to-triple translation framework. Instead of solving each task relying on taskspecific datasets and models, we formalize the task as a translation between task-specific input text and output triples. By taking the taskspecific input, we enable a task-agnostic translation by leveraging the latent knowledge that a pre-trained language model has about the task. We further demonstrate that a simple pretraining task of predicting which relational information corresponds to which input text is an effective way to produce task-specific outputs. This enables the zero-shot transfer of our framework to downstream tasks. We study the zero-shot performance of this framework on open information extraction (OIE2016, NYT, WEB, PENN), relation classification (FewRel and TACRED), and factual probe (Google-RE and T-REx). The model transfers non-trivially to most tasks and is often competitive with a fully supervised method without the need for any task-specific training. For instance, we significantly outperform the F1 score of the supervised open information extraction without needing to use its training set. 1

show abstract

“…In order to extract triples, most approaches try to identify linguistic extraction patterns, either hand-crafted or automatically learned from the data. An abundance of such systems exists, relying on concepts ranging from rule-based paradigms that focus on the grammatical and syntactic properties of the language (Fader et al, 2011;Del Corro and Gemulla, 2013), to supervised learning-based ones that leverage annotated data sources to train classifiers, with more recent implementations making use of language models (Kolluru et al, 2020;Ro et al, 2020). Despite the existence of so many approaches however, the majority of them just focuses on evaluating the efficiency of different triple extraction tools on raw data, without incorporating any preprocessing strategies to limit the number of potentially uninformative triples (Niklaus et al, 2018).…”

Section: Information Extractionmentioning

confidence: 99%

“…The test set was automatically translated using our EN2EL mixed case model. We compare our extraction results with Multi2OIE from Ro et al (2020), an OIE engine with state-of-the-art performance on English corpora. Multi2OIE relies on the pretrained multilingual BERT model and can perform multilingual extractions through zero-shot learning (it is trained on English data); thus it can be leveraged to produce results on the Greek CaRB test set.…”

Section: Oie Performancementioning

confidence: 99%

PENELOPIE: Enabling Open Information Extraction for the Greek Language through Machine Translation

Papadopoulos¹,

Papadakis²,

Matsatsinis³

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research W

View full text Add to dashboard Cite

In this work, we present a methodology that aims at bridging the gap between high and low-resource languages in the context of Open Information Extraction, showcasing it on the Greek language. The goals of this paper are twofold: First, we build Neural Machine Translation (NMT) models for English-to-Greek and Greek-to-English based on the Transformer architecture. Second, we leverage these NMT models to produce English translations of Greek text as input for our NLP pipeline, to which we apply a series of pre-processing and triple extraction tasks. Finally, we back-translate the extracted triples to Greek. We conduct an evaluation of both our NMT and OIE methods on benchmark datasets and demonstrate that our approach outperforms the current state-of-the-art for the Greek natural language.

show abstract

Multiˆ2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT

Cited by 31 publications

References 35 publications

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

PENELOPIE: Enabling Open Information Extraction for the Greek Language through Machine Translation

Contact Info

Product

Resources

About