Diagnosing BERT with Retrieval Heuristics

Câmara, Arthur; Hauff, Claudia

doi:10.1007/978-3-030-45439-5_40

Cited by 29 publications

(34 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though we can demonstrate promising first steps to axiomatically explain retrieval systems' result rankings, the addition of further well-grounded axiomatic constraints capturing other retrieval aspects seems to be needed to further improve the explanations. Its current limitations notwithstanding, we consider our approach a promising complement to the more tightly-controlled studies from previous work [7,32,44]. While the latter shed light on the general principles under which complex relevance scoring models operate, our axiomatic reconstruction framework could help IR system designers-or even end users-make sense of a concrete ranking for a real-world query.…”

Section: Discussionmentioning

confidence: 98%

“…The study's diagnostic datasets focus on only 4 simple individual axioms, which cannot completely account for neural rankers' decisions. In a follow-up publication, Câmara and Hauff [7] extend the idea to building diagnostic datasets for 9 axioms separately, with a focus on BERT-based rankers. MacAvaney et al [32] systematize the analysis of neural IR models as a framework comprising three testing strategies-controlled manipulation of individual measurements (e.g., term frequency or document length), manipulating document texts, and constructing tests from non-IR datasets-whose influence on neural rankers' behavior can be investigated.…”

Section: Axioms / Sourcesmentioning

confidence: 99%

See 1 more Smart Citation

Towards Axiomatic Explanations for Neural Ranking Models

Völske

Bondarenko

Fröbe

et al. 2021

Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

View full text Add to dashboard Cite

Recently, neural networks have been successfully employed to improve upon state-of-the-art effectiveness in ad-hoc retrieval tasks via machine-learned ranking functions. While neural retrieval models grow in complexity and impact, little is understood about their correspondence with well-studied IR principles. Recent work on interpretability in machine learning has provided tools and techniques to understand neural models in general, yet there has been little progress towards explaining ranking models.We investigate whether one can explain the behavior of neural ranking models in terms of their congruence with well understood principles of document ranking by using established theories from axiomatic IR. Axiomatic analysis of information retrieval models has formalized a set of constraints on ranking decisions that reasonable retrieval models should fulfill. We operationalize this axiomatic thinking to reproduce rankings based on combinations of elementary constraints. This allows us to investigate to what extent the ranking decisions of neural rankers can be explained in terms of the existing retrieval axioms, and which axioms apply in which situations. Our experimental study considers a comprehensive set of axioms over several representative neural rankers. While the existing axioms can already explain the particularly confident ranking decisions rather well, future work should extend the axiom set to also cover the other still "unexplainable" neural IR rank decisions. CCS CONCEPTS• Information systems → Retrieval models and ranking.

show abstract

Section: Discussionmentioning

confidence: 98%

Section: Axioms / Sourcesmentioning

confidence: 99%

Towards Axiomatic Explanations for Neural Ranking Models

Völske

Bondarenko

Fröbe

et al. 2021

Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Negative samples are used for the pairwise loss function used to train PACRR, and BM25 results offer higher-quality negative samples than random paragraph would (e.g., these examples have matching terms, whereas random paragraphs likely would not). 18 For each positive sample, we include 6 negative samples. To a point, including more negative samples has been shown to improve the performance of PACRR at the expense of training time; we found 6 negative samples to be an effective balance between the two considerations.…”

Section: Methodsmentioning

confidence: 99%

Effective and practical neural ranking

MacAvaney

2021

SIGIR Forum

View full text Add to dashboard Cite

Supervised machine learning methods that use neural networks ("deep learning") have yielded substantial improvements to a multitude of Natural Language Processing (NLP) tasks in the past decade. Improvements to Information Retrieval (IR) tasks, such as ad-hoc search, lagged behind those in similar NLP tasks, despite considerable community efforts. Although there are several contributing factors, I argue in this dissertation that early attempts were not more successful because they did not properly consider the unique characteristics of IR tasks when designing and training ranking models. I first demonstrate this by showing how large-scale datasets containing weak relevance labels can successfully replace training on in-domain collections. This technique improves the variety of queries encountered when training and helps mitigate concerns of over-fitting particular test collections. I then show that dataset statistics available in specific IR tasks can be easily incorporated into neural ranking models alongside the textual features, resulting in more effective ranking models. I also demonstrate that contextualized representations, particularly those from transformer-based language models, considerably improve neural ad-hoc ranking performance. I find that this approach is neither limited to the task of ad-hoc ranking (as demonstrated by ranking clinical reports) nor English content (as shown by training effective cross-lingual neural rankers). These efforts demonstrate that neural approaches can be effective for ranking tasks. However, I observe that these techniques are impractical due to their high query-time computational costs. To overcome this, I study approaches for offloading computational cost to index-time, substantially reducing query-time latency. These techniques make neural methods practical for ranking tasks. Finally, I take a deep dive into better understanding the linguistic biases of the methods I propose compared to contemporary and traditional approaches. The findings from this analysis highlight potential pitfalls of recent methods and provide a way to measure progress in this area going forward.

show abstract

“…Pre-trained Language Models: Probing and Knowledge Infusion. The extensive success of pre-trained transformer-based language models such as BERT [6], RoBERTa [19] 3 , and T5 [36] can be attributed to the transformers' computational efficiency, the amount of pre-training data, the large amount of computations used to train such models 4 [5,20], by using probing tasks [11,44] that examine BERT's representation to understand which linguistic information is encoded at which layer and by using diagnostic datasets [4].…”

Section: Related Workmentioning

confidence: 99%

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Penha

Hauff

2020

Fourteenth ACM Conference on Recommender Systems

Self Cite

View full text Add to dashboard Cite

Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling by achieving impressive results on numerous downstream tasks. It has also been shown that they are able to implicitly store factual knowledge in their parameters after pre-training. Understanding what the pre-training procedure of LMs actually learns is a crucial step for using and improving them for Conversational Recommender Systems (CRS). We first study how much off-the-shelf pre-trained BERT "knows" about recommendation items such as books, movies and music. In order to analyze the knowledge stored in BERT's parameters, we use different probes (i.e., tasks to examine a trained model regarding certain properties) that require different types of knowledge to solve, namely content-based and collaborative-based. Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres. In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings. Both are important types of knowledge for a CRS-a system that should ideally be able to explain recommendations, match a user's textual description of their information need to relevant items, and elucidate a user's interests. We resort to BERT's Masked Language Modelling (MLM) head to probe its knowledge about the genre of items, with cloze style prompts. In addition, we employ BERT's Next Sentence Prediction (NSP) head and representations' similarity (SIM) to compare relevant and non-relevant search and recommendation query-document inputs to explore whether BERT can, without any fine-tuning, rank relevant items first. The insights we gain from these probes help us understand BERT's limitations and strengths for conversational recommendation. Finally, we study how BERT performs in a conversational recommendation downstream task. To this end, we fine-tune BERT to act as a retrieval-based conversational recommender system. Overall, our analyses and experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based knowledge than collaborative-based knowledge; and (iii) fails on conversational recommendation when faced with adversarial data.

show abstract

Diagnosing BERT with Retrieval Heuristics

Cited by 29 publications

References 33 publications

Towards Axiomatic Explanations for Neural Ranking Models

Towards Axiomatic Explanations for Neural Ranking Models

Effective and practical neural ranking

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Contact Info

Product

Resources

About