2012
DOI: 10.1147/jrd.2012.2185901
|View full text |Cite
|
Sign up to set email alerts
|

Textual resource acquisition and engineering

Abstract: A key requirement for high-performing question-answering (QA) systems is access to high-quality reference corpora from which answers to questions can be hypothesized and evaluated. However, the topic of source acquisition and engineering has received very little attention so far. This is because most existing systems were developed under organized evaluation efforts that included reference corpora as part of the task specification. The task of answering Jeopardy!i questions, on the other hand, does not come wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
16
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 17 publications
0
16
0
Order By: Relevance
“…The Hypothesis Generation phase takes as input results from question analysis, summarized in the previous section. The first four primary search components in the diagram show Watson's Document and Passage search strategies, which target unstructured knowledge resources such as encyclopedia documents and newswire articles [10]. On the other hand, the last two search components, namely, Answer Lookup and PRISMATIC search, use different types of structured resources.…”
Section: Search and Candidate Generation Overviewmentioning
confidence: 99%
“…The Hypothesis Generation phase takes as input results from question analysis, summarized in the previous section. The first four primary search components in the diagram show Watson's Document and Passage search strategies, which target unstructured knowledge resources such as encyclopedia documents and newswire articles [10]. On the other hand, the last two search components, namely, Answer Lookup and PRISMATIC search, use different types of structured resources.…”
Section: Search and Candidate Generation Overviewmentioning
confidence: 99%
“…and TREC in an iterative error analysis performed by the Watson development team. The collection (subsequently referred to as All Sources) comprises 25.6 GB of text, including Wikipedia and the other encyclopedias in Section 6.2, dictionaries such as Wiktionary, thesauri, newswire sources such as a New York Times archive, literature and other sources of trivia knowledge [Chu-Carroll et al, 2012b]. It also includes the AQUAINT newswire corpus, which was the reference source in TREC 11-15 and contains the answers to all questions in these datasets (except NIL questions, which were not used in our experiments).…”
Section: Experimental Setup Using Watsonmentioning
confidence: 99%
“…challenge used a broad variety of content sources, primary among which were Wikipedia and Wiktionary; the motivation for source selection is presented in Chu-Carroll et al (2012c). challenge used a broad variety of content sources, primary among which were Wikipedia and Wiktionary; the motivation for source selection is presented in Chu-Carroll et al (2012c).…”
Section: Hypothesis Generationmentioning
confidence: 99%