Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014
DOI: 10.3115/v1/d14-1153
|View full text |Cite
|
Sign up to set email alerts
|

Intrinsic Plagiarism Detection using N-gram Classes

Abstract: When it is not possible to compare the suspicious document to the source document(s) plagiarism has been committed from, the evidence of plagiarism has to be looked for intrinsically in the document itself. In this paper, we introduce a novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that we called n-gram classes. The proposed method was evaluated on three publicly available standard corpora. The obtained results are comparable to the ones obtained by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
9
0
6

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(27 citation statements)
references
References 10 publications
0
9
0
6
Order By: Relevance
“…Before starting the analysis, let us recall that Stamatatos' method is a well-known IPD method and we provided a brief description of it in Section 2.2. As for our method, it was first introduced in the short paper (Bensalem et al 2014), and we will provide a detailed description of it in the next section.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Before starting the analysis, let us recall that Stamatatos' method is a well-known IPD method and we provided a brief description of it in Section 2.2. As for our method, it was first introduced in the short paper (Bensalem et al 2014), and we will provide a detailed description of it in the next section.…”
Section: Discussionmentioning
confidence: 99%
“…One of the most straightforward text representation approaches used in IPD methods is character n-grams. Some methods use them alone (Bensalem et al 2014;Kestemont et al 2011;Stamatatos 2009a), while others include additional features (Kern et al 2012;Kuznetsov et al 2016;Rao et al 2011;Stein et al 2011). Character n-grams are known to be a powerful and effective text representation in style analysisbased tasks such as authorship attribution (Kešelj et al 2003;Stamatatos 2016) and authorship verification (Brocardo et al 2013;Jankowska et al 2014).…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Bensalem mengenalkan metode pendeteksian plagiarisme intrinsik bahasa baru yang berbasis pada representasi teks baru dalam kelas n-gram / pengklasifikasian kemunculan n-gram. Sebagai contoh tingkat kelas kemunculan yang paling sering muncul, kelas kemunculan paling sering dan kelas kemunculan menengah [8]. Palkovskii menggabungkan semua hasil penelitian sebelumnya dari penelitian PAN12 dan PAN13 dan memperbaiki metode pendeteksian plagiarisme yang dikembangkan sebelumnya, dengan bantuan: n-gram kontekstual, n-gram konteks sekitar, n-gram berbasis entitas, dan lainlain [9].…”
Section: Pendahuluanunclassified
“…In [17] explained n-gram class as a number from 0 to m−1 such that the class labeled 0 involves the least frequent n-grams and the class labeled m−1 contains the most frequent n-grams in a document. If m > 2, classes between 0 and m−1 will contain n-grams with intermediate frequency levels.…”
Section: Methodology and System Frameworkmentioning
confidence: 99%