Proceedings of the Third Workshop on Insights From Negative Results in NLP 2022
DOI: 10.18653/v1/2022.insights-1.1
|View full text |Cite
|
Sign up to set email alerts
|

On Isotropy Calibration of Transformer Models

Abstract: Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic -the embeddings are distributed in a narrow cone. Meanwhile, static word representations (e.g., Word2Vec or GloVe) have been shown to benefit from isotropic spaces. Therefore, previous work has developed methods to calibrate the embedding space of transformers in order to ensure isotropy. However, a recent study (Cai et al., 2021) shows that the embedding space of tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 16 publications
2
3
0
Order By: Relevance
“…Cai et al (2020) showed that, in spite of BERT embeddings having global anisotropy, each cluster in the embedding space is isotropic, and that this local isotropy could be enough for Transformer models to achieve their full representation power. This hypothesis is supported by recent empirical results from (Ding et al, 2022). If the anisotropy comes from the existence of different clusters, and these clusters encode non-semantic information like token frequency, this can be matched with the biases described by Jiang et al (2022) and the representation degeneration by Gao et al (2019).…”
Section: Related Worksupporting
confidence: 65%
See 2 more Smart Citations
“…Cai et al (2020) showed that, in spite of BERT embeddings having global anisotropy, each cluster in the embedding space is isotropic, and that this local isotropy could be enough for Transformer models to achieve their full representation power. This hypothesis is supported by recent empirical results from (Ding et al, 2022). If the anisotropy comes from the existence of different clusters, and these clusters encode non-semantic information like token frequency, this can be matched with the biases described by Jiang et al (2022) and the representation degeneration by Gao et al (2019).…”
Section: Related Worksupporting
confidence: 65%
“…This is, anisotropy is not a problem if it is the same for all tokens. If this is true, then isotropy correction techniques should not increase the performance in semantic tasks of these models, which has been empirically proven by Ding et al (2022); Jiang et al (2022). In the next set of experiments, we further support this idea through empirical evidence.…”
Section: Conclusion On Bias Analysissupporting
confidence: 53%
See 1 more Smart Citation
“…To this purpose, they propose regularization terms that hamper the singular value decay of the embedding matrix. However, despite the success of these optimization tricks in lowering the anisotropy of Transformer representations, Ding et al (2022) have recently shown that they do not bring any improvement, relying on several tasks like summarization and sentence similarity (STS). They even observed a certain deterioration of the performance brought by anisotropy mitigation techniques.…”
Section: Introductionmentioning
confidence: 99%
“… Following Cai et al (2021) this global estimate of ansiotropy does not rule out the possibility of distinct and locally isotropic clusters in the embedding space Ding et al (2022). show that isotropy calibration methods(Gao et al, 2019;Li et al, 2020) do not lead to consistent improvements on downstream tasks when models already benefit from local isotropy.…”
mentioning
confidence: 98%