2018
DOI: 10.1007/978-3-319-76941-7_30
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance

Abstract: Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Mover's Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Textual metrics that consider specific qualities in the system outputs, like complexity and diversity, are also used to evaluate NLG systems (Dusek et al, 2019;Hashimoto et al, 2019;Sagarkar et al, 2018;Purdy et al, 2018). Word mover's distance has recently been used for NLP tasks like learning word embeddings (Zhang et al, 2017;Wu et al, 2018), textual entailment (Sulea, 2017), document similarity and classification (Kusner et al, 2015;Huang et al, 2016;Atasu et al, 2017), image captioning (Kilickaya et al, 2017), document retrieval (Balikas et al, 2018), clustering for semantic word-rank (Zhang and Wang, 2018), and as additional loss for text generation that measures the optimal transport between the generated hypothesis and reference text (Chen et al, 2019). We investigate WMD for multi-sentence text evaluation and generation and introduce sentence embedding-based metrics.…”
Section: Related Workmentioning
confidence: 99%
“…Textual metrics that consider specific qualities in the system outputs, like complexity and diversity, are also used to evaluate NLG systems (Dusek et al, 2019;Hashimoto et al, 2019;Sagarkar et al, 2018;Purdy et al, 2018). Word mover's distance has recently been used for NLP tasks like learning word embeddings (Zhang et al, 2017;Wu et al, 2018), textual entailment (Sulea, 2017), document similarity and classification (Kusner et al, 2015;Huang et al, 2016;Atasu et al, 2017), image captioning (Kilickaya et al, 2017), document retrieval (Balikas et al, 2018), clustering for semantic word-rank (Zhang and Wang, 2018), and as additional loss for text generation that measures the optimal transport between the generated hypothesis and reference text (Chen et al, 2019). We investigate WMD for multi-sentence text evaluation and generation and introduce sentence embedding-based metrics.…”
Section: Related Workmentioning
confidence: 99%
“…Since the original WMD is computationally expensive, we approximate the distance by using the Regularized Wasserstein distance proposed by [41] and only keep the five closest articles. The five articles with the least distance are then selected for computation with the original WMD.…”
Section: Content Analysis: Semantic Distance Analysismentioning
confidence: 99%
“…However these methods have been solely applied in the monolingual space. Other methods have been proposed to leverage EMD for cross-lingual document retrieval [4], however these methods treat individual words as the base semantic unit for comparison. The large number of tokens present in web documents coupled with the cubic complexity of WMD make these approaches intractable for large-scale web-alignment.…”
Section: Related Workmentioning
confidence: 99%