2018
DOI: 10.48550/arxiv.1803.01400
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations

Abstract: Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. However, they typically fall short of the performances of more complex models such as InferSent. Here, we generalize the concept of average word embeddings to power mean word embeddings. We show that the concatenation of different types of power mean word embeddings considerably closes the gap to state-of-the-art methods monolingually and substantially outperforms these more complex techniques crosslingually. In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(29 citation statements)
references
References 23 publications
0
29
0
Order By: Relevance
“…GloVe embedding; b) SIF [11]: Derived from an improved random-walk model. Consist of two parts: weighted averaging of word vectors and first principal component removal; c) p-means [14]: Concatenating different word embedding models and different power ratios; d) DCT [15]: Introduce discrete cosine transform into sentence sequential modeling; e) VLAWE [18]: Introduce VLAD (vector of locally aggregated descriptor) into sentence embedding field; 2) Parameterized Models a) Skip-thought [5]: Extend word2vec unsupervised training objectives from word level into sentence level; b) InferSent [6]: Bi-directional LSTM encoder trained on high quality sentence inference data. c) Sent2Vec [21]: Learn n-gram word representation and use average as the sentence representation.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…GloVe embedding; b) SIF [11]: Derived from an improved random-walk model. Consist of two parts: weighted averaging of word vectors and first principal component removal; c) p-means [14]: Concatenating different word embedding models and different power ratios; d) DCT [15]: Introduce discrete cosine transform into sentence sequential modeling; e) VLAWE [18]: Introduce VLAD (vector of locally aggregated descriptor) into sentence embedding field; 2) Parameterized Models a) Skip-thought [5]: Extend word2vec unsupervised training objectives from word level into sentence level; b) InferSent [6]: Bi-directional LSTM encoder trained on high quality sentence inference data. c) Sent2Vec [21]: Learn n-gram word representation and use average as the sentence representation.…”
Section: Methodsmentioning
confidence: 99%
“…Experimental results on supervised tasks are shown in Table II. The S3E method outperforms all non-parameterized models, including DCT [15], VLAWE [18] and p-means [14]. 4 https://github.com/facebookresearch/SentEval…”
Section: B Supervised Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…Regarding the former, large RNNs are by far the most popular (Conneau et al, 2017;Kiros et al, 2015;Tang et al, 2017;Nie et al, 2017;Hill et al, 2016;McCann et al, 2017;Peters et al, 2018;Logeswaran & Lee, 2018), followed by convolutional neural networks (Gan et al, 2017). A third group are efficient methods that aggregate word embeddings (Wieting et al, 2016;Arora et al, 2017;Pagliardini et al, 2018;Rücklé et al, 2018). Most of the methods in the latter group are word order agnostic.…”
Section: Related Workmentioning
confidence: 99%
“…Despite CBOW's simplicity, it attains strong results on many downstream tasks. Using sophisticated weighting schemes, the performance of aggregated word embeddings can be further increased (Arora et al, 2017), coming even close to strong LSTM baselines (Rücklé et al, 2018;Henao et al, 2018) such as InferSent (Conneau et al, 2017). This raises the question how much benefit recurrent encoders actually provide over simple word embedding based methods (Wieting & Kiela, 2019).…”
Section: Introductionmentioning
confidence: 99%