Overview of the Answer Validation Exercise 2006

Self Cite

The Answer Validation Exercise at the Cross Language Evaluation Forum is aimed at developing systems able to decide whether the answer of a Question Answering system is correct or not. We present here the exercise description, the changes in the evaluation methodology with respect to the first edition, and the results of this second edition (AVE 2007). The changes in the evaluation methodology had two objectives: the first one was to quantify the gain in performance when more sophisticated validation modules are introduced in QA systems. The second objective was to bring systems based on Textual Entailment to the Automatic Hypothesis Generation problem which is not part itself of the Recognising Textual Entailment (RTE) task but a need of the Answer Validation setting. 9 groups have participated with 16 runs in 4 different languages. Compared with the QA systems, the results show an evidence of the potential gain that more sophisticated AV modules introduce in the task of QA.

Section: Evaluation Of the Answer Validation Exercisementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Overview of the Answer Validation Exercise 2008

Rodrigo

Peñas

Verdejo

2009

Self Cite

“…the AVE 2006 Working Notes (Peñas et al, 2006). Most of the groups use lexical or syntactic overlapping as features for machine learning; other groups derive the logic or semantic representations of natural language texts and perform proving.…”

Section: Introduction and Related Workmentioning

confidence: 99%

Using Recognizing Textual Entailment as a Core Engine for Answer Validation

Wang

Neumann

2008

Abstract. This paper is about our approach to answer validation, which centered by a Recognizing Textual Entailment (RTE) core engine. We first combined the question and the answer into Hypothesis (H) and view the document as Text (T); then, we used our RTE system to check whether the entailment relation holds between them. Our system was evaluated on the Answer Validation Exercise (AVE) task and achieved f-measures of 0.46 and 0.55 for two submission runs, which both outperformed others' results for the English language.

“…MAVE was evaluated on the AVE 2007 test set for German [1]; see Table 1 which also lists the reference results of the current version of MAVE. Here, CF means clustering of answers and optimizing thresholds for f-measure, CQ means clustering and optimizing for qa-accuracy, EF means ERA method and optimizing for f-measure, EQ means ERA optimizing for qa-accuracy, and * marks the current results of MAVE.…”

Section: Discussionmentioning

confidence: 99%

“…c) For a complete answer validation, also mark the remaining items as VALIDATED or REJECTED. While machine learning and approaches to recognizing textual entailment are popular choices for answer validation -see [1] for an overview of the techniques used in the Answer Validation Exercise (AVE) 2007 -it is more typical of QA systems to exploit redundancy by aggregating evidence (e.g. by selecting the most frequent answers).…”

Section: Introductionmentioning

confidence: 99%

Combining Logic and Aggregation for Answer Selection

Glöckner