Evaluation-as-a-Service for the Computational Sciences

Hopfgartner, Frank; Hanbury, Allan; Müller, Henning; Eggel, Ivan; Balog, Krisztian; Brodt, Torben; Cormack, Gordon V.; Lin, Jimmy; Kalpathy–Cramer, Jayashree; Kando, Noriko; Katô, Makoto; Krithara, Anastasia; Gollub, Tim; Potthast, Martin; Viegas, Evelyne; Mercer, Simon

doi:10.1145/3239570

Cited by 21 publications

(14 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Anyway, the issue of reproducibility of retrieval systems regards the IR field as a whole, not the only neural IR. Reproducibility efforts focus on several core topics in IR, ranging from reproducing baselines [145,243] and core IR components [202] to evaluation [82,114] and advanced applications [103]. Reproducibility is now a core research topic in IR, with dedicated workshops [76,14,42], a specific track at the European Conference on Information Retrieval (ECIR) since 2015, and dedicated journal special issues [77,78].…”

Section: Knowledge-enhanced Neural Ir Modelsmentioning

confidence: 99%

Developing unsupervised knowledge-enhanced models to reduce the semantic gap in information retrieval

Marchesin

2021

SIGIR Forum

View full text Add to dashboard Cite

In this thesis we tackle the semantic gap, a long-standing problem in Information Retrieval (IR). The semantic gap can be described as the mismatch between users' queries and the way retrieval models answer to such queries. Two main lines of work have emerged over the years to bridge the semantic gap: (i) the use of external knowledge resources to enhance the bag-of-words representations used by lexical models, and (ii) the use of semantic models to perform matching between the latent representations of queries and documents. To deal with this issue, we first perform an in-depth evaluation of lexical and semantic models through different analyses [Marchesin et al., 2019]. The objective of this evaluation is to understand what features lexical and semantic models share, if their signals are complementary, and how they can be combined to effectively address the semantic gap. In particular, the evaluation focuses on (semantic) neural models and their critical aspects. Each analysis brings a different perspective in the understanding of semantic models and their relation with lexical models. The outcomes of this evaluation highlight the differences between lexical and semantic signals, and the need to combine them at the early stages of the IR pipeline to effectively address the semantic gap. Then, we build on the insights of this evaluation to develop lexical and semantic models addressing the semantic gap. Specifically, we develop unsupervised models that integrate knowledge from external resources, and we evaluate them for the medical domain - a domain with a high social value, where the semantic gap is prominent, and the large presence of authoritative knowledge resources allows us to explore effective ways to address it. For lexical models, we investigate how - and to what extent - concepts and relations stored within knowledge resources can be integrated in query representations to improve the effectiveness of lexical models. Thus, we propose and evaluate several knowledge-based query expansion and reduction techniques [Agosti et al., 2018, 2019; Di Nunzio et al., 2019]. These query reformulations are used to increase the probability of retrieving relevant documents by adding to or removing from the original query highly specific terms. The experimental analyses on different test collections for Precision Medicine - a particular use case of Clinical Decision Support (CDS) - show the effectiveness of the proposed query reformulations. In particular, a specific subset of query reformulations allow lexical models to achieve top performing results in all the considered collections. Regarding semantic models, we first analyze the limitations of the knowledge-enhanced neural models presented in the literature. Then, to overcome these limitations, we propose SAFIR [Agosti et al., 2020], an unsupervised knowledge-enhanced neural framework for IR. SAFIR integrates external knowledge in the learning process of neural IR models and it does not require labeled data for training. Thus, the representations learned within this framework are optimized for IR and encode linguistic features that are relevant to address the semantic gap. The evaluation on different test collections for CDS demonstrate the effectiveness of SAFIR when used to perform retrieval over the entire document collection or to retrieve documents for Pseudo Relevance Feedback (PRF) methods - that is, when it is used at the early stages of the IR pipeline. In particular, the quantitative and qualitative analyses highlight the ability of SAFIR to retrieve relevant documents affected by the semantic gap, as well as the effectiveness of combining lexical and semantic models at the early stages of the IR pipeline - where the complementary signals they provide can be used to obtain better answers to semantically hard queries.

show abstract

Section: Knowledge-enhanced Neural Ir Modelsmentioning

confidence: 99%

Developing unsupervised knowledge-enhanced models to reduce the semantic gap in information retrieval

Marchesin

2021

SIGIR Forum

View full text Add to dashboard Cite

show abstract

“…The PRIMAD model [8] offers orientation which components of an IR experiment may affect reproducibility or have to be considered when trying to reproduce the corresponding experiment. The Evaluation-as-a-Service (EaaS) paradigm [13] reverses the conventional evaluation approach of a shared task like it is applied at the TREC conference. Instead of letting participants submit the results (runs) only, the complete retrieval system is submitted in a form such that it can be rerun independently by others to produce the results.…”

Section: Related Workmentioning

confidence: 99%

Reproducible Online Search Experiments

Breuer

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In the empirical sciences, the evidence is commonly manifested by experimental results. However, very often, these findings are not reproducible, hindering scientific progress. Innovations in the field of information retrieval (IR) are mainly driven by experimental results as well. While there are several attempts to assure the reproducibility of offline experiments with standardized test collections, reproducible outcomes of online experiments remain an open issue. This research project will be concerned with the reproducibility of online experiments, including real-world user feedback. In contrast to previous living lab attempts by the IR community, this project has a stronger focus on making IR systems and corresponding results reproducible. The project aims to provide insights concerning key components that affect reproducibility in online search experiments. Outcomes help to improve the design of reproducible IR online experiments in the future.

show abstract

“…One important future direction is to build extensions that would enable tasks beyond batch retrieval, for example, to support interactive retrieval (with real or simulated user input) and evaluation on private and other sensitive data. Moreover, our effort represents a first systematic attempt to embody the Evaluation-as-a-Service paradigm [7] via Docker containers. We believe that there are many possible paths forward building on the ideas presented here.…”

Section: Future Vision and Ongoing Workmentioning

confidence: 99%