2020
DOI: 10.1007/978-3-030-45439-5_40
|View full text |Cite
|
Sign up to set email alerts
|

Diagnosing BERT with Retrieval Heuristics

Abstract: Word embeddings, made widely popular in 2013 with the release of word2vec, have become a mainstay of NLP engineering pipelines. Recently, with the release of BERT, word embeddings have moved from the term-based embedding space to the contextual embedding space-each term is no longer represented by a single lowdimensional vector but instead each term and its context determine the vector weights. BERT's setup and architecture have been shown to be general enough to be applicable to many natural language tasks. I… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

3
6

Authors

Journals

citations
Cited by 29 publications
(34 citation statements)
references
References 33 publications
0
34
0
Order By: Relevance
“…Even though we can demonstrate promising first steps to axiomatically explain retrieval systems' result rankings, the addition of further well-grounded axiomatic constraints capturing other retrieval aspects seems to be needed to further improve the explanations. Its current limitations notwithstanding, we consider our approach a promising complement to the more tightly-controlled studies from previous work [7,32,44]. While the latter shed light on the general principles under which complex relevance scoring models operate, our axiomatic reconstruction framework could help IR system designers-or even end users-make sense of a concrete ranking for a real-world query.…”
Section: Discussionmentioning
confidence: 98%
See 1 more Smart Citation
“…Even though we can demonstrate promising first steps to axiomatically explain retrieval systems' result rankings, the addition of further well-grounded axiomatic constraints capturing other retrieval aspects seems to be needed to further improve the explanations. Its current limitations notwithstanding, we consider our approach a promising complement to the more tightly-controlled studies from previous work [7,32,44]. While the latter shed light on the general principles under which complex relevance scoring models operate, our axiomatic reconstruction framework could help IR system designers-or even end users-make sense of a concrete ranking for a real-world query.…”
Section: Discussionmentioning
confidence: 98%
“…The study's diagnostic datasets focus on only 4 simple individual axioms, which cannot completely account for neural rankers' decisions. In a follow-up publication, Câmara and Hauff [7] extend the idea to building diagnostic datasets for 9 axioms separately, with a focus on BERT-based rankers. MacAvaney et al [32] systematize the analysis of neural IR models as a framework comprising three testing strategies-controlled manipulation of individual measurements (e.g., term frequency or document length), manipulating document texts, and constructing tests from non-IR datasets-whose influence on neural rankers' behavior can be investigated.…”
Section: Axioms / Sourcesmentioning
confidence: 99%
“…Negative samples are used for the pairwise loss function used to train PACRR, and BM25 results offer higher-quality negative samples than random paragraph would (e.g., these examples have matching terms, whereas random paragraphs likely would not). 18 For each positive sample, we include 6 negative samples. To a point, including more negative samples has been shown to improve the performance of PACRR at the expense of training time; we found 6 negative samples to be an effective balance between the two considerations.…”
Section: Methodsmentioning
confidence: 99%
“…Pre-trained Language Models: Probing and Knowledge Infusion. The extensive success of pre-trained transformer-based language models such as BERT [6], RoBERTa [19] 3 , and T5 [36] can be attributed to the transformers' computational efficiency, the amount of pre-training data, the large amount of computations used to train such models 4 [5,20], by using probing tasks [11,44] that examine BERT's representation to understand which linguistic information is encoded at which layer and by using diagnostic datasets [4].…”
Section: Related Workmentioning
confidence: 99%