From Paraphrase Database to Compositional Paraphrase Model and Back

Wieting, John; Bansal, Mohit; Gimpel, Kevin; Livescu, Karen

doi:10.1162/tacl_a_00143

Cited by 211 publications

(251 citation statements)

References 25 publications

(56 reference statements)

Supporting

Mentioning

242

Contrasting

Order By: Relevance

“…objectives based on the distributional hypothesis are probably not to blame, as word vectors trained without relying on the distributional hypothesis, such as those of Wieting et al (2015), still exhibit non-normality to some degree. The actual causes remain to be determined.…”

Section: Discussionmentioning

confidence: 99%

Correlation Coefficients and Semantic Textual Similarity

Железняк¹,

Savkov

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. We thoroughly characterise cases where Pearson correlation (and thus cosine similarity) is unfit as similarity measure. Importantly, we show that Pearson correlation is appropriate for some word vectors but not others. When it is not appropriate, we illustrate how common nonparametric rank correlation coefficients can be used instead to significantly improve performance. We support our analysis with a series of evaluations on word-level and sentencelevel semantic textual similarity benchmarks. On the latter, we show that even the simplest averaged word vectors compared by rank correlation easily rival the strongest deep representations compared by cosine similarity.

show abstract

Section: Discussionmentioning

confidence: 99%

Correlation Coefficients and Semantic Textual Similarity

Железняк¹,

Savkov

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

show abstract

“…(3) in our case, both the encoder and the inverse of the decoder are capable of producing a vector representation per time step in a given sentence, although during training, only the last one is re- 1 Arora et al (2017); 2 Wieting et al (2015); 3 Wieting and Gimpel (2018); 4 Conneau et al (2017); 5 Wieting and Gimpel (2018); 6−10 Agirre et al (2012Agirre et al ( , 2014Agirre et al ( , 2015Agirre et al ( , 2016; 11 Marelli et al (2014) garded as the sentence representation for the fast training speed, it is more reasonable to make use of all representations at all time steps with various pooling functions to compute a vector representations to produce high-quality sentence representations that excel the downstream tasks.…”

Section: Representation Poolingmentioning

confidence: 99%

Exploiting Invertible Decoders for Unsupervised Sentence Representation Learning

Tang

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Encoder-decoder models for unsupervised sentence representation learning using the distributional hypothesis effectively constrain the learnt representation of a sentence to only that needed to reproduce the next sentence. While the decoder is important to constrain the representation, these models tend to discard the decoder after training since only the encoder is needed to map the input sentence into a vector representation. However, parameters learnt in the decoder also contain useful information about the language. In order to utilise the decoder after learning, we present two types of decoding functions whose inverse can be easily derived without expensive inverse calculation. Therefore, the inverse of the decoding function serves as another encoder that produces sentence representations. We show that, with careful design of the decoding functions, the model learns good sentence representations, and the ensemble of the representations produced from the encoder and the inverse of the decoder demonstrate even better generalisation ability and solid transferability.

show abstract

“…WME approximates a kernel derived from WMD with a set of random documents. embeddings (Pennington et al, 2014;Wieting et al, 2015b) could also be utilized.…”

Section: Word2vec and Word Mover's Distancementioning

confidence: 99%

“…We compare WME against 10 supervised, simi-sepervised, and unsupervised methods for performing textual similarity tasks. Six supervised methods are initialized with Paragram-SL999(PSL) word vectors (Wieting et al, 2015b) and then trained on the PPDB dataset, including: 1) PARAGRAM-PHRASE (PP) (Wieting et al, 2015a) Setup. There are total 22 textual similarity datasets from STS tasks (2012-2015) (Agirre et al, 2012(Agirre et al, , 2013(Agirre et al, , 2014(Agirre et al, , 2015, SemEval 2014 Semantic Relatedness task (Xu et al, 2015), and SemEval 2015 Twitter task (Marelli et al, 2014).…”

Section: Comparisons On Textual Similarity Tasksmentioning

confidence: 99%

Word Mover’s Embedding: From Word2Vec to Document Embedding

Wu¹,

Yen²,

Xu³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called Word Mover's Distance (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the Word Mover's Embedding (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.

show abstract

From Paraphrase Database to Compositional Paraphrase Model and Back

Cited by 211 publications

References 25 publications

Correlation Coefficients and Semantic Textual Similarity

Correlation Coefficients and Semantic Textual Similarity

Exploiting Invertible Decoders for Unsupervised Sentence Representation Learning

Word Mover’s Embedding: From Word2Vec to Document Embedding

Contact Info

Product

Resources

About