Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2000
DOI: 10.1145/345508.345539
|View full text |Cite
|
Sign up to set email alerts
|

Do batch and user evaluations give the same results?

Abstract: Do improvements in system performance demonstrated by batch evaluations conJbr the same benefit for real users?We carried out experiments designed to investigate this question. After identi~ing a weighting scheme that gave maximum improvement over the baseline in a noninteractive evaluation, we used it with real users searching on an instance recall task. Our results showed the weighting scheme giving beneficial results in batch studies did not do so with real users. Further analysis did identi~ other factors … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
75
0

Year Published

2007
2007
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(80 citation statements)
references
References 9 publications
(1 reference statement)
5
75
0
Order By: Relevance
“…However, criticism has been raised on the assumption that offline evaluation could predict an algorithm's effectiveness in online evaluations or user studies. More precisely, several researchers have shown that results from offline evaluations do not necessarily correlate with results from user studies or online evaluations [93,269,270,[278][279][280][281]. This means that approaches that are effective in offline evaluations are not necessarily effective in real-world recommender systems.…”
Section: Offline Evaluationsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, criticism has been raised on the assumption that offline evaluation could predict an algorithm's effectiveness in online evaluations or user studies. More precisely, several researchers have shown that results from offline evaluations do not necessarily correlate with results from user studies or online evaluations [93,269,270,[278][279][280][281]. This means that approaches that are effective in offline evaluations are not necessarily effective in real-world recommender systems.…”
Section: Offline Evaluationsmentioning
confidence: 99%
“…Interestingly, the three studies with the most participants were all conducted by the authors of TechLens [26,93,117], who are also the only authors in the field of research-paper recommender systems who discuss the potential shortcomings of offline evaluations [87]. It seems that other researchers in this field are not aware of-or chose not to address-problems associated with offline evaluations, although there has been quite a discussion outside the research-paper recommender-system community [93,269,270,[278][279][280][281].…”
Section: Offline Evaluationsmentioning
confidence: 99%
“…It is generally known that users' queries retrieve different documents than the batch queries used in systemcentered evaluations, so it is possible that subjects will find documents that were not included in the relevance pools [129]. If a document was not in the pool, then it would not have been judged by the original assessor.…”
Section: Trec Collectionsmentioning
confidence: 99%
“…Numerous studies have demonstrated that relevance assessments do not generalize across subjects [80,129]. Indeed, it is understood that different people will make different relevance assessments given the same topics and documents.…”
Section: Trec Collectionsmentioning
confidence: 99%
See 1 more Smart Citation