On the Effect of Low-Frequency Terms on Neural-IR Models

Hofstätter, Sebastian; Rekabsaz, Navid; Eickhoff, Carsten; Hanbury, Allan

doi:10.1145/3331184.3331344

Cited by 31 publications

(27 citation statements)

References 14 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This suggests that relevance-matching difficulty provides positive feedback signals to the neural model to face diverse learning instances, and therefore to better generalize over different application domains. This is however true to the constraint that the vocabulary of the dataset (V ocab) is not too large so as to boost neural ranking performance as outlined in [16,36]. Looking at the residual variables (Dataset j and M j ), we can corroborate the observations made at a first glance in RQ1 regarding the model families clearly opposing (DRMM-PACRR-KNRM-VBERT) and CEDR since the former statistically exhibit higher REM metrics values than CEDR.…”

Section: Empirical Analysis Of Catastrophic Forgetting In Neuralsupporting

confidence: 73%

Studying Catastrophic Forgetting in Neural Ranking Models

Lovón-Melgarejo

Soulier

Pinel-Sauvagnat

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Several deep neural ranking models have been proposed in the recent IR literature. While their transferability to one target domain held by a dataset has been widely addressed using traditional domain adaptation strategies, the question of their cross-domain transferability is still under-studied. We study here in what extent neural ranking models catastrophically forget old knowledge acquired from previously observed domains after acquiring new knowledge, leading to performance decrease on those domains. Our experiments show that the effectiveness of neural IR ranking models is achieved at the cost of catastrophic forgetting and that a lifelong learning strategy using a cross-domain regularizer successfully mitigates the problem. Using an explanatory approach built on a regression model, we also show the effect of domain characteristics on the rise of catastrophic forgetting. We believe that the obtained results can be useful for both theoretical and practical future work in neural IR.

show abstract

Section: Empirical Analysis Of Catastrophic Forgetting In Neuralsupporting

confidence: 73%

Studying Catastrophic Forgetting in Neural Ranking Models

Lovón-Melgarejo

Soulier

Pinel-Sauvagnat

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Nevertheless, we found this training set to produce less than optimal results and the trained BERT models show no robustness against increased re-ranking depth. This phenomena of having to tune the best re-ranking depth for effectiveness, rather than efficiency, has been studied as part of early non-BERT re-rankers [11]. With the advent of Transformer-based re-rankers, this technique became obsolete [12].…”

Section: Training Data Generationmentioning

confidence: 99%

Establishing Strong Baselines for TripClick Health Retrieval

Hofstätter¹,

Althammer²,

Sertkan³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the -originally too noisy -training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domainspecific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

show abstract

“…In addition, we investigate classical IR models, namely BM25 [45] and RM3 PRF [1]. The BM25 and RM3 PRF models are computed using the Anserini [57] toolkit with the same setting as proposed by Hofstätter et al [24]. The results of the BM25 model is used in RM3 PRF as the first-stage retrieval.…”

Section: Experiments Setupmentioning

confidence: 99%

Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation of BERT Rankers

Rekabsaz

Kopeinik

Schedl

2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Societal biases resonate in the retrieved contents of information retrieval (IR) systems, resulting in reinforcing existing stereotypes. Approaching this issue requires established measures of fairness regarding the representation of various social groups in retrieved contents, as well as methods to mitigate such biases, particularly in the light of the advances in deep ranking models. In this work, we first provide a novel framework to measure the fairness in the retrieved text contents of ranking models. Introducing a ranker-agnostic measurement, the framework also enables the disentanglement of the effect on fairness of collection from that of rankers. Second, we propose an adversarial bias mitigation approach applied to the stateof-the-art Bert rankers, which jointly learns to predict relevance and remove protected attributes. We conduct experiments on two passage retrieval collections (MS MARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness annotations of a selected subset of queries regarding gender attributes. Our results on the MS MARCO benchmark show that, while the fairness of all ranking models is lower than the ones of ranker-agnostic baselines, the fairness in retrieved contents significantly improves when applying the proposed adversarial training. Lastly, we investigate the trade-off between fairness and utility, showing that through applying a combinatorial model selection method, we can maintain the significant improvements in fairness without any significant loss in utility. CCS CONCEPTS• Information systems → Learning to rank; Test collections.

show abstract

On the Effect of Low-Frequency Terms on Neural-IR Models

Cited by 31 publications

References 14 publications

Studying Catastrophic Forgetting in Neural Ranking Models

Studying Catastrophic Forgetting in Neural Ranking Models

Establishing Strong Baselines for TripClick Health Retrieval

Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation of BERT Rankers

Contact Info

Product

Resources

About