Learning to Answer Questions from Wikipedia Infoboxes

Morales, A. L.; Premtoon, Varot; Avery, Cordelia; Felshin, Sue; Katz, Boris

doi:10.18653/v1/d16-1199

Cited by 13 publications

(11 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…educated at(T uring, P rinceton)), collecting roughly 100,000 natural-language questions to support QA against a knowledge graph. Morales et al (2016) used a similar process to collect questions from Wikipedia infoboxes, yielding the 15,000-example InfoboxQA dataset. For the task of identifying predicate-argument structures, QA-SRL (He et al, 2015) was proposed as an open schema for semantic roles, in which the relation between an argument and a predicate is expressed as a natural-language question containing the predicate ("Where was someone educated?")…”

Section: Negative Examplesmentioning

confidence: 99%

Zero-Shot Relation Extraction via Reading Comprehension

Levy

Seo

Choi

et al. 2017

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

357

334

View full text Add to dashboard Cite

We show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot. This reduction has several advantages: we can (1) learn relationextraction models by extending recent neural reading-comprehension techniques, (2) build very large training sets for those models by combining relation-specific crowd-sourced questions with distant supervision, and even (3) do zero-shot learning by extracting new relation types that are only specified at test-time, for which we have no labeled training examples. Experiments on a Wikipedia slot-filling task demonstrate that the approach can generalize to new questions for known relation types with high accuracy, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels, setting the bar for future work on this task.

show abstract

Section: Negative Examplesmentioning

confidence: 99%

Zero-Shot Relation Extraction via Reading Comprehension

Levy

Seo

Choi

et al. 2017

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

357

334

View full text Add to dashboard Cite

show abstract

“…WIKIPEDIA contains an abundance of humancurated, multi-domain information and has several structured resources such as infoboxes and WIKIDATA (Vrandečić, 2012) associated with it. WIKIPEDIA has thus been used for a wealth of research to build datasets posing queries about a single sentence (Morales et al, 2016;Levy et al, 2017) or article (Yang et al, 2015;Hewlett et al, 2016;Rajpurkar et al, 2016). However, no attempt has been made to construct a cross-document multi-step RC dataset based on WIKIPEDIA.…”

Section: Wikihopmentioning

confidence: 99%

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Welbl

Stenetorp

Riedel

2018

TACL

418

426

View full text Add to dashboard Cite

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently no resources exist to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence -effectively performing multihop, alias multi-step, inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, 1 and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information; and providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 54.5% on an annotated test set, compared to human performance at 85.0%, leaving ample room for improvement.

show abstract

“…Articles are organized in hierarchical sections, and many have an "infobox," a table that summarizes key information in the article. To access these kinds of information, we developed WikipediaBase (Morales, 2016), a system that turns Wikipedia into a virtual database and organizes it in an object-property-value data model. We consider infobox attributes and section headers to be properties.…”

Section: Start Parses These Annotations and Stores The Parsed Structumentioning

confidence: 99%

“…To address these types of questions, we compiled a crowdsourced corpus of over 15,000 questions about Wikipedia infoboxes. We used these questions to train a machine learning model that selects the correct response from a set of candidate answers with high accuracy (Morales, 2016;Morales, Premtoon, Avery, Felshin, & Katz, 2016). Our ongoing work in automatic techniques to answer questions will allow the START system to quickly scale up to new types of questions and information sources.…”

Section: Start Parses These Annotations and Stores The Parsed Structumentioning

confidence: 99%