2007
DOI: 10.1007/978-3-540-74999-8_32
|View full text |Cite
|
Sign up to set email alerts
|

Overview of the Answer Validation Exercise 2006

Abstract: The first Answer Validation Exercise (AVE) has been launched at the Cross Language Evaluation Forum 2006. This task is aimed at developing systems able to decide whether the answer of a Question Answering system is correct or not. The exercise is described here together with the evaluation methodology and the systems results. The starting point for the AVE 2006 was the reformulation of the Answer Validation as a Recognizing Textual Entailment problem, under the assumption that hypothesis can be automatically g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0
2

Year Published

2007
2007
2019
2019

Publication Types

Select...
8

Relationship

3
5

Authors

Journals

citations
Cited by 35 publications
(33 citation statements)
references
References 6 publications
0
31
0
2
Order By: Relevance
“…In [7] was argued why the AVE evaluation is based on the detection of the correct answers. Instead of using an overall accuracy as the evaluation measure, we proposed the use of precision (1), recall (2) and Fmeasure (3) (harmonic mean) over answers that must be VALIDATED.…”
Section: Evaluation Of the Answer Validation Exercisementioning
confidence: 99%
See 1 more Smart Citation
“…In [7] was argued why the AVE evaluation is based on the detection of the correct answers. Instead of using an overall accuracy as the evaluation measure, we proposed the use of precision (1), recall (2) and Fmeasure (3) (harmonic mean) over answers that must be VALIDATED.…”
Section: Evaluation Of the Answer Validation Exercisementioning
confidence: 99%
“…The first Answer Validation Exercise (AVE 2006) [7] was activated last year in order to promote the development and evaluation of subsystems aimed at validating the correctness of the answers given by QA systems. In some sense, systems must emulate human assessment of QA responses and decide whether an answer is correct or not according to a given text.…”
Section: Introductionmentioning
confidence: 99%
“…the AVE 2006 Working Notes (Peñas et al, 2006). Most of the groups use lexical or syntactic overlapping as features for machine learning; other groups derive the logic or semantic representations of natural language texts and perform proving.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…MAVE was evaluated on the AVE 2007 test set for German [1]; see Table 1 which also lists the reference results of the current version of MAVE. Here, CF means clustering of answers and optimizing thresholds for f-measure, CQ means clustering and optimizing for qa-accuracy, EF means ERA method and optimizing for f-measure, EQ means ERA optimizing for qa-accuracy, and * marks the current results of MAVE.…”
Section: Discussionmentioning
confidence: 99%
“…c) For a complete answer validation, also mark the remaining items as VALIDATED or REJECTED. While machine learning and approaches to recognizing textual entailment are popular choices for answer validation -see [1] for an overview of the techniques used in the Answer Validation Exercise (AVE) 2007 -it is more typical of QA systems to exploit redundancy by aggregating evidence (e.g. by selecting the most frequent answers).…”
Section: Introductionmentioning
confidence: 99%