2018
DOI: 10.1145/3239570
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation-as-a-Service for the Computational Sciences

Abstract: Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1
1

Relationship

3
4

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 56 publications
1
13
0
Order By: Relevance
“…Anyway, the issue of reproducibility of retrieval systems regards the IR field as a whole, not the only neural IR. Reproducibility efforts focus on several core topics in IR, ranging from reproducing baselines [145,243] and core IR components [202] to evaluation [82,114] and advanced applications [103]. Reproducibility is now a core research topic in IR, with dedicated workshops [76,14,42], a specific track at the European Conference on Information Retrieval (ECIR) since 2015, and dedicated journal special issues [77,78].…”
Section: Knowledge-enhanced Neural Ir Modelsmentioning
confidence: 99%
“…Anyway, the issue of reproducibility of retrieval systems regards the IR field as a whole, not the only neural IR. Reproducibility efforts focus on several core topics in IR, ranging from reproducing baselines [145,243] and core IR components [202] to evaluation [82,114] and advanced applications [103]. Reproducibility is now a core research topic in IR, with dedicated workshops [76,14,42], a specific track at the European Conference on Information Retrieval (ECIR) since 2015, and dedicated journal special issues [77,78].…”
Section: Knowledge-enhanced Neural Ir Modelsmentioning
confidence: 99%
“…The PRIMAD model [8] offers orientation which components of an IR experiment may affect reproducibility or have to be considered when trying to reproduce the corresponding experiment. The Evaluation-as-a-Service (EaaS) paradigm [13] reverses the conventional evaluation approach of a shared task like it is applied at the TREC conference. Instead of letting participants submit the results (runs) only, the complete retrieval system is submitted in a form such that it can be rerun independently by others to produce the results.…”
Section: Related Workmentioning
confidence: 99%
“…One important future direction is to build extensions that would enable tasks beyond batch retrieval, for example, to support interactive retrieval (with real or simulated user input) and evaluation on private and other sensitive data. Moreover, our effort represents a first systematic attempt to embody the Evaluation-as-a-Service paradigm [7] via Docker containers. We believe that there are many possible paths forward building on the ideas presented here.…”
Section: Future Vision and Ongoing Workmentioning
confidence: 99%