Question Answering Pilot Task at CLEF 2004

Herrera, Jesús; Peñas, Anselmo; Verdejo, Felisa

doi:10.1007/11519645_57

Cited by 18 publications

(10 citation statements)

References 11 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This year two additional evaluation measures, i.e. the K1 value and r coefficient, borrowed by [2], were experimentally introduced, in order to find a comprehensive measure which takes into account both accuracy and confidence. Anyway, since confidence was an additional and optional value, only some systems could be assigned the CWS, and consequently the K1 and r coefficient; therefore an analysis based on these measures is not very significant at the moment.…”

Section: Fig 2 Best and Average Results In The Qa@clef Campaignsmentioning

confidence: 99%

Overview of the CLEF 2004 Multilingual Question Answering Track

Magnini¹,

Vallin²,

Ayache³

et al. 2005

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. The general aim of the third CLEF Multilingual Question Answering Track was to set up a common and replicable evaluation framework to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages. Nine target languages and ten source languages were exploited to enact 8 monolingual and 73 cross-language tasks. Twenty-four groups participated in the exercise. Overall results showed a general increase in performance in comparison to last year. The best performing monolingual system irrespective of target language answered 64.5% of the questions correctly (in the monolingual Portuguese task), while the average of the best performances for each target language was 42.6%. The cross-language step instead entailed a considerable drop in performance. In addition to accuracy, the organisers also measured the relation between the correctness of an answer and a system's stated confidence in it, showing that the best systems did not always provide the most reliable confidence score. We provide an overview of the 2005 QA track, detail the procedure followed to build the test sets and present a general analysis of the results. 308A. Vallin et al.

show abstract

Section: Fig 2 Best and Average Results In The Qa@clef Campaignsmentioning

confidence: 99%

Overview of the CLEF 2004 Multilingual Question Answering Track

Magnini¹,

Vallin²,

Ayache³

et al. 2005

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…a new type of questions, and two new measures, namely K1 measure and r value. Both question type and measure were borrowed from the Spanish pilot task proposed at CLEF 2004 [2].…”

Section: Tasksmentioning

confidence: 99%

Overview of the CLEF 2007 Multilingual Question Answering Track

Giampiccolo¹,

Forner²,

Herrera

et al. 2008

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

The general aim of the third CLEF Multilingual Question Answering Track was to set up a common and replicable evaluation framework to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages. Nine target languages and ten source languages were exploited to enact 8 monolingual and 73 cross-language tasks. Twenty-four groups participated in the exercise.Overall results showed a general increase in performance in comparison to last year. The best performing monolingual system irrespective of target language answered 64.5% of the questions correctly (in the monolingual Portuguese task), while the average of the best performances for each target language was 42.6%. The cross-language step instead entailed a considerable drop in performance. In addition to accuracy, the organisers also measured the relation between the correctness of an answer and a system's stated confidence in it, showing that the best systems did not always provide the most reliable confidence score.

show abstract

“…Alternatives would be to refine these questions by making the temporal restriction explicit or to extend the gold standard by answers that are to be considered correct if working on the web. Table 1 contains evaluation results for InSicht-W3: the percentages of right, inexact, and wrong answers (separately for non-empty answers and empty answers) and the K1-measure (see (Herrera et al, 2005) for a definition). For comparison, the results of the textual QA system InSicht on the QA@CLEF document collection are shown in the first row.…”

Section: Language-specific Problemsmentioning

confidence: 99%

Adapting a semantic question answering system to the web

Hartrumpf

2006

Proceedings of the Workshop on Multilingual Question Answering - MLQA '06

View full text Add to dashboard Cite

This paper describes how a question answering (QA) system developed for smallsized document collections of several million sentences was modified in order to work with a monolingual subset of the web. The basic QA system relies on complete sentence parsing, inferences, and semantic representation matching. The extensions and modifications needed for useful and quick answers from web documents are discussed. The main extension is a two-level approach that first accesses a web search engine and downloads some of its document hits and then works similar to the basic QA system. Most modifications are restrictions like a maximal number of documents and a maximal length of investigated document parts; they ensure acceptable answer times. The resulting web QA system is evaluated on the German test collection from QA@CLEF 2004. Several parameter settings and strategies for accessing the web search engine are investigated. The main results are: precision-oriented extensions and experimentally derived parameter settings are needed to achieve similar performance on the web as on small-sized document collections that show higher homogeneity and quality of the contained texts; adapting a semantic QA system to the web is feasible, but answering a question is still expensive in terms of bandwidth and CPU time.

show abstract

Question Answering Pilot Task at CLEF 2004

Cited by 18 publications

References 11 publications

Overview of the CLEF 2004 Multilingual Question Answering Track

Overview of the CLEF 2004 Multilingual Question Answering Track

Overview of the CLEF 2007 Multilingual Question Answering Track

Adapting a semantic question answering system to the web

Contact Info

Product

Resources

About