2012
DOI: 10.1007/978-3-642-33247-0_8
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Language High Similarity Search Using a Conceptual Thesaurus

Abstract: Abstract. This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
25
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 14 publications
0
25
0
Order By: Relevance
“…For that, a vector of concepts is built for each textual unit using dictionaries or thesaurus. The similarity between the vectors of concepts can be measured using the Cosine similarity, Euclidean distance, or any MT-Based Models Kent and Salim [18], Muhr et al [29], SS-CL-AES [3], CL-PDAE [2] Comparable Corpora-Based Models CL-KGA [11], CL-ESA [12] Parallel Corpora-Based Models CL-ASA [6], CL-LSI [35], CL-KCCA [42], CL-AE-LSI [17] Dictionary-Based Models CL-CTS [15], CL-DBLI [32], CL-PDAE [2] Syntax-Based Models Length Model [16], CL-CNG [22] Fig. 1: Taxonomy of different approaches for cross-language similarity detection [10].…”
Section: Cross-language Semantic Textual Similarity Detectionmentioning
confidence: 99%
See 3 more Smart Citations
“…For that, a vector of concepts is built for each textual unit using dictionaries or thesaurus. The similarity between the vectors of concepts can be measured using the Cosine similarity, Euclidean distance, or any MT-Based Models Kent and Salim [18], Muhr et al [29], SS-CL-AES [3], CL-PDAE [2] Comparable Corpora-Based Models CL-KGA [11], CL-ESA [12] Parallel Corpora-Based Models CL-ASA [6], CL-LSI [35], CL-KCCA [42], CL-AE-LSI [17] Dictionary-Based Models CL-CTS [15], CL-DBLI [32], CL-PDAE [2] Syntax-Based Models Length Model [16], CL-CNG [22] Fig. 1: Taxonomy of different approaches for cross-language similarity detection [10].…”
Section: Cross-language Semantic Textual Similarity Detectionmentioning
confidence: 99%
“…In [15] a Cross-Language Conceptual Thesaurus-Based Similarity model (CL-CTS) is proposed to measure the similarity between textual units written in different languages (Spanish, English and German). CL-CTS is based on the thesaurus concepts vectors presented in Eurovoc 1 where a Cosine similarity is computed between these vectors.…”
Section: Cross-language Semantic Textual Similarity Detectionmentioning
confidence: 99%
See 2 more Smart Citations
“…The cross-language alignment-based similarity analysis (CL-ASA) model [3,2] is instead based on a statistical machine translation technology that combines probabilistic translations, using a statistical bilingual dictionary and similarity analysis. Finally, the cross-language conceptual thesaurus based similarity (CL-CTS) model [8] tries to measure the similarity between the documents in terms of shared concepts, using a conceptual thesaurus, and named entities among them. Some of these models have been compared in detecting CL plagiarism in [14].…”
Section: Introductionmentioning
confidence: 99%