Evaluating answer passages using summarization measures

Keikha, Mostafa; Park, Jae Hyun; Croft, W. Bruce

doi:10.1145/2600428.2609485

Cited by 29 publications

(18 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We found that the average term‐level kappa ratio between two different human summaries is 0.33. This agreement is comparable with previous studies, which reported a kappa of approximately 0.35 and 0.39, respectively, for manual summaries of news reports and columns (Hori, Hirao, & Isozaki, ) and 0.38 for manually annotated answer passages (Keikha, Park, & Croft, ).…”

Section: Experiments Results and Analysissupporting

confidence: 90%

Tweet‐biased summarization

Yulianti

Huspi

Sanderson

2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

We examined whether the microblog comments given by people after reading a web document could be exploited to improve the accuracy of a web document summarization system. We examined the effect of social information (i.e., tweets) on the accuracy of the generated summaries by comparing the user preference for TBS (tweet-biased summary) with GS (generic summary). The result of crowdsourcing-based evaluation shows that the user preference for TBS was significantly higher than GS. We also took random samples of the documents to see the performance of summaries in a traditional evaluation using ROUGE, which, in general, TBS was also shown to be better than GS. We further analyzed the influence of the number of tweets pointed to a web document on summarization accuracy, finding a positive moderate correlation between the number of tweets pointed to a web document and the performance of generated TBS as measured by user preference. The results show that incorporating social information into the summary generation process can improve the accuracy of summary. The reason for people choosing one summary over another in a crowdsourcing-based evaluation is also presented in this article.

show abstract

Section: Experiments Results and Analysissupporting

confidence: 90%

Tweet‐biased summarization

Yulianti

Huspi

Sanderson

2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…There have been previous efforts on developing benchmark data sets for non-factoid question answering or answer passage retrieval [4,7,20]. Perhaps the closest prior research to our work is the WebAP data set created by Keikha et al [7,20]. Compared to WebAP, WikiPassageQA has a two significant differences: (1) the number of questions in WikiPassageQA is significantly larger than that of WebAP (4165 v.s.…”

Section: Existing Related Datasetsmentioning

confidence: 99%

“…Currently, there is only one collection specifically created for retrieving answer passages in documents, WebAP [7], where contiguous sentences of a document are labeled as relevant to a query.…”

Section: Introductionmentioning

confidence: 99%

WikiPassageQA

Cohen

Liu

Croft

2018

The 41st International ACM SIGIR Conference on Research &Amp; Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

With the rise in mobile and voice search, answer passage retrieval acts as a critical component of an effective information retrieval system for open domain question answering. Currently, there are no comparable collections that address non-factoid question answering within larger documents while simultaneously providing enough examples sufficient to train a deep neural network. In this paper, we introduce a new Wikipedia based collection specific for nonfactoid answer passage retrieval containing thousands of questions with annotated answers and show benchmark results on a variety of state of the art neural architectures and retrieval models. The experimental results demonstrate the unique challenges presented by answer passage retrieval within topically relevant documents for future research.

show abstract

“…The manual inspection was done on the 7 Like for example "Can someone explain the theory of e = mc 2 ?" 8 We increased the previous assignment limit to 10,000 for annotating the test set. 20% of each worker's submission as well as the QA pairs with no agreement.…”

Section: Relevance Assessmentmentioning

confidence: 99%

“…arxiv ' Despite the widely-known importance of studying answer passage retrieval for non-factoid questions [1,2,8,20], the research progress for this task is limited by the availability of high-quality public data. Some existing collections, e.g., [8,14], consist of few queries, which are not sufficient to train sophisticated machine learning models for the task. Some others, e.g., [1], significantly suffer from incomplete judgments.…”

Section: Introductionmentioning

confidence: 99%

ANTIQUE: A Non-factoid Question Answering Benchmark

Hashemi

Aliannejadi

Zamani

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34,011 manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and recently developed neural IR models.

show abstract

Evaluating answer passages using summarization measures

Cited by 29 publications

References 11 publications

Tweet‐biased summarization

Tweet‐biased summarization

WikiPassageQA

ANTIQUE: A Non-factoid Question Answering Benchmark

Contact Info

Product

Resources

About