2005
DOI: 10.1007/11519645_57
|View full text |Cite
|
Sign up to set email alerts
|

Question Answering Pilot Task at CLEF 2004

Abstract: Abstract.A Pilot Question Answering Task has been activated in the Cross-Language Evaluation Forum 2004 with a twofold objective. In the first place, the evaluation of Question Answering systems when they have to answer conjunctive lists, disjunctive lists and questions with temporal restrictions. In the second place, the evaluation of systems' capability to give an accurate self-scoring about the confidence on their answers. In this way, two measures have been designed to be applied on all these different typ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2005
2005
2009
2009

Publication Types

Select...
7
1
1

Relationship

4
5

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 11 publications
(6 reference statements)
1
9
0
Order By: Relevance
“…This year two additional evaluation measures, i.e. the K1 value and r coefficient, borrowed by [2], were experimentally introduced, in order to find a comprehensive measure which takes into account both accuracy and confidence. Anyway, since confidence was an additional and optional value, only some systems could be assigned the CWS, and consequently the K1 and r coefficient; therefore an analysis based on these measures is not very significant at the moment.…”
Section: Fig 2 Best and Average Results In The Qa@clef Campaignsmentioning
confidence: 99%
“…This year two additional evaluation measures, i.e. the K1 value and r coefficient, borrowed by [2], were experimentally introduced, in order to find a comprehensive measure which takes into account both accuracy and confidence. Anyway, since confidence was an additional and optional value, only some systems could be assigned the CWS, and consequently the K1 and r coefficient; therefore an analysis based on these measures is not very significant at the moment.…”
Section: Fig 2 Best and Average Results In The Qa@clef Campaignsmentioning
confidence: 99%
“…a new type of questions, and two new measures, namely K1 measure and r value. Both question type and measure were borrowed from the Spanish pilot task proposed at CLEF 2004 [2].…”
Section: Tasksmentioning
confidence: 99%
“…Alternatives would be to refine these questions by making the temporal restriction explicit or to extend the gold standard by answers that are to be considered correct if working on the web. Table 1 contains evaluation results for InSicht-W3: the percentages of right, inexact, and wrong answers (separately for non-empty answers and empty answers) and the K1-measure (see (Herrera et al, 2005) for a definition). For comparison, the results of the textual QA system InSicht on the QA@CLEF document collection are shown in the first row.…”
Section: Language-specific Problemsmentioning
confidence: 99%