Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2014
DOI: 10.3115/v1/p14-2080
|View full text |Cite
|
Sign up to set email alerts
|

Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval

Abstract: We present an approach to cross-language retrieval that combines dense knowledgebased features and sparse word translations. Both feature types are learned directly from relevance rankings of bilingual documents in a pairwise ranking framework. In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over learningto-rank with either only dense or only sparse features, and over very competitive baselines that combine state-of-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
7
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 18 publications
(21 citation statements)
references
References 17 publications
0
20
0
Order By: Relevance
“…where W ∈ R P ×Q encodes a feature matrix (Bai et al, 2010;Schamoni et al, 2014). The value of f (•, •) is the prediction of the classifier given a target vector p and a vector of related features q.…”
Section: Baseline Features Systemmentioning
confidence: 99%
“…where W ∈ R P ×Q encodes a feature matrix (Bai et al, 2010;Schamoni et al, 2014). The value of f (•, •) is the prediction of the classifier given a target vector p and a vector of related features q.…”
Section: Baseline Features Systemmentioning
confidence: 99%
“…See Figure 1 for an illustration. This data construction process is similar to (Schamoni et al, 2014) who made an English-German CLIR dataset, but ours is at a larger scale. Specifically, we use Wikipedia dumps released on August 23, 2017.…”
Section: Large-scale Clir Datasetmentioning
confidence: 97%
“…The intuition is that the first sentence is usually a well-defined summary of its corresponding article and should be thematically related for articles linked to it from another language. Similar to (Schamoni et al, 2014), title words from the query sentences are removed, because they may be present across different language editions. This deletion prevents the task from becoming an easy keyword matching task.…”
Section: Large-scale Clir Datasetmentioning
confidence: 99%
“…We leave the investigation of complex queries to future work. We want to emphasize that our mining pipeline is compatible with all query types; for example, we can use the first sentences of documents as queries (Schamoni et al, 2014;Sasaki et al, 2018) if desired. 5 Note that documents in different languages do not share document IDs.…”
Section: Design Choicesmentioning
confidence: 99%