Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2 2017
DOI: 10.18653/v1/e17-2066
|View full text |Cite
|
Sign up to set email alerts
|

Using Word Embedding for Cross-Language Plagiarism Detection

Abstract: This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(25 citation statements)
references
References 12 publications
(11 reference statements)
0
25
0
Order By: Relevance
“…The research on plagiarism detection has yielded many approaches that employ lexical [4,18], syntactical [33,44], semantic [41,27], or cross-lingual text analysis [12,14]. These approaches reliably detect copied or moderately altered plagiarism.…”
Section: Related Workmentioning
confidence: 99%
“…The research on plagiarism detection has yielded many approaches that employ lexical [4,18], syntactical [33,44], semantic [41,27], or cross-lingual text analysis [12,14]. These approaches reliably detect copied or moderately altered plagiarism.…”
Section: Related Workmentioning
confidence: 99%
“…Word embeddings are becoming an effective way to represent words by relatively low-dimensional vectors, where semantic relatedness is easily measured by the cosine similarity of two word vectors. Ferrero et al [8] introduce a syntax weighting in distributed representations of sentences, and prove its usefulness for textual similarity detection. But in their approach, they only utilized one large dataset to build word vectors.…”
Section: Related Workmentioning
confidence: 99%
“…CL-WES (Ferrero et al, 2017) consists in a cosine similarity on distributed representations of sentences, which are obtained by the weighted sum of each word vector in a sentence. As in previous section, each word vector is syntactically and frequentially weighted.…”
Section: Cross-language Word Embedding-based Similaritymentioning
confidence: 99%