Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) 2019
DOI: 10.18653/v1/w19-4309
|View full text |Cite
|
Sign up to set email alerts
|

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

Abstract: We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…Recently, bilingual sentence embedding or word embedding are used to calculate similarity of a sentence pair [1,3,4,7,8,13,14,23,25,27,32]. [18] built bilingual representation of a sentence by averaging pre-trained bilingual word embeddings.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, bilingual sentence embedding or word embedding are used to calculate similarity of a sentence pair [1,3,4,7,8,13,14,23,25,27,32]. [18] built bilingual representation of a sentence by averaging pre-trained bilingual word embeddings.…”
Section: Related Workmentioning
confidence: 99%
“…This problem may also appear in other distant language pairs, such as English-Japanese, English-Korean , and so on. To address this problem, [23] learnt bilingual sentence embeddings from a combination of parallel and monolingual data. Then, they connected autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation.…”
Section: Introductionmentioning
confidence: 99%
“…Currently, most sentence similarity studies mainly focus on monolingual languages and have achieved good results, and cross-lingual sentence similarity has also achieved good results on some high-resource languages [14]. In Xinjiang, on the other hand, cross-linguistic sentence similarity research on small languages, mainly Uyghur, still needs more attention.…”
Section: Introductionmentioning
confidence: 99%
“…Cross-lingual sentence representation models (Schwenk and Douze, 2017;España-Bonet et al, 2017;Yu et al, 2018;Devlin et al, 2019;Chidambaram et al, 2019;Artetxe and Schwenk, 2019b;Kim et al, 2019;Sabet et al, 2019;Conneau and Lample, 2019;Feng et al, 2020;Li 1 https://github.com/Mao-KU/ lightweight-crosslingual-sent2vec and Mak, 2020) learn language-agnostic representations facilitating tasks like cross-lingual sentence retrieval (XSR) and cross-lingual knowledge transfer on downstream tasks without the need for training a new monolingual representation model from scratch. Thus, such models benefit from an increased amount of data during training and lead to improved performances for low-resource languages.…”
Section: Introductionmentioning
confidence: 99%
“…Cross-lingual sentence representation models (Schwenk and Douze, 2017;España-Bonet et al, 2017;Yu et al, 2018;Devlin et al, 2019;Chidambaram et al, 2019;Artetxe and Schwenk, 2019b;Kim et al, 2019;Sabet et al, 2019;Conneau and Lample, 2019;Feng et al, 2020;Li 1 https://github.com/Mao-KU/ lightweight-crosslingual-sent2vec…”
Section: Introductionmentioning
confidence: 99%