2020
DOI: 10.48550/arxiv.2007.01852
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language-agnostic BERT Sentence Embedding

Abstract: We adapt multilingual BERT (Devlin et al., 2019) to produce language-agnostic sentence embeddings for 109 languages. While English sentence embeddings have been obtained by fine-tuning a pretrained BERT model (Reimers and Gurevych, 2019), such models have not been applied to multilingual sentence embeddings. Our model combines masked language model (MLM) and translation language model (TLM) (Conneau and Lample, 2019) pretraining with a translation ranking task using bi-directional dual encoders (Yang et al.,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
150
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 108 publications
(177 citation statements)
references
References 15 publications
2
150
0
3
Order By: Relevance
“…The same trends in the results were observed with other standard multilingual sentence encoders such as LaBSE(Feng et al, 2020), see Appendix C for additional results.…”
supporting
confidence: 80%
“…The same trends in the results were observed with other standard multilingual sentence encoders such as LaBSE(Feng et al, 2020), see Appendix C for additional results.…”
supporting
confidence: 80%
“…Besides the multilingual version of the classic BERT, there is also a Languageagnostic BERT Sentence Embedding (LaBSE) [19]. It is a multilingual version of BERT with sentence embeddings similar to Sentence-BERT [36].…”
Section: Transformers and Bertmentioning
confidence: 99%
“…• LaBSE [13], a language-agnostic BERT sentence embedding model pre-trained on texts in 109 languages.…”
Section: Modelsmentioning
confidence: 99%