Maria Maistro scite author profile

Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibilityoriented dataset, which allows us to validate our measures and which can also be used to develop future measures. CCS CONCEPTS • Information systems → Evaluation of retrieval results; Retrieval effectiveness;

show abstract

Continuation Methods and Curriculum Learning for Learning to Rank

Ferro

Lucchese

Maistro

et al. 2018

View full text Add to dashboard Cite

Injecting user models and time into precision via Markov chains

Ferrante

Ferro

Maistro

2014

View full text Add to dashboard Cite

We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains in order to inject user models into precision. Continuous-time MP behaves like timecalibrated measures, bringing the time spent by the user into the evaluation of a system; discrete-time MP behaves like traditional evaluation measures. Being part of the same Markovian framework, the time-based and rank-based versions of MP produce values that are directly comparable. We show that it is possible to recreate average precision using specific user models and this helps in providing an explanation of Average Precision (AP) in terms of user models more realistic than the ones currently used to justify it. We also propose several alternative models that take into account different possible behaviors in scanning a ranked result list. Finally, we conduct a thorough experimental evaluation of MP on standard TREC collections in order to show that MP is as reliable as other measures and we provide an example of calibration of its time parameters based on click logs from Yandex.

show abstract

Overview of CENTRE@CLEF 2018: A First Tale in the Systematic Reproducibility Realm

Ferro

Maistro

Sakai

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maria Maistro

Towards a Formal Framework for Utility-oriented Measurements of Retrieval Effectiveness

How to Measure the Reproducibility of System-oriented IR Experiments

Continuation Methods and Curriculum Learning for Learning to Rank

Injecting user models and time into precision via Markov chains

Overview of CENTRE@CLEF 2018: A First Tale in the Systematic Reproducibility Realm

Contact Info

Product

Resources

About