2013
DOI: 10.1162/coli_a_00153
|View full text |Cite
|
Sign up to set email alerts
|

Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

Abstract: In this paper, we describe an approach to create a summary obfuscation corpus for the task of plagiarism detection. Our method is based on information from the Document Understanding Conferences related to years 2001 and 2006, for the English language. Overall, an unattributed summary used within someone else's document is considered a kind of plagiarism because the main author's ideas are still in a succinct form. In order to create the corpus, we use a Named Entity Recognizer (NER) to identify the entities w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
72
0
4

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 98 publications
(78 citation statements)
references
References 17 publications
2
72
0
4
Order By: Relevance
“…There are other possibilities we would like to explore in the future, such as the use of TF*IDF weights and the investigation of in how far the size of the window which represents the context influences the system performance. Our lexical simplification system could also help to normalize paraphrases to the simplest word choice, which could be useful in plagiaism detection [2].…”
Section: Discussionmentioning
confidence: 99%
“…There are other possibilities we would like to explore in the future, such as the use of TF*IDF weights and the investigation of in how far the size of the window which represents the context influences the system performance. Our lexical simplification system could also help to normalize paraphrases to the simplest word choice, which could be useful in plagiaism detection [2].…”
Section: Discussionmentioning
confidence: 99%
“…from others' work into one's own documents without citing the corresponding source of information; thus, plagiarism comes into the picture. Plagiarism is the reuse of someone else's ideas, processes, results or words without explicitly acknowledging the author's work and source [1].…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, the creation of linguistic typologies is one of the main tasks to carry out. There are some examples of such theories that are elaborating the most complex phenomena (Shutova 2011;Barrón-Cedeño 2013;Low 2010). One of the most complex subjects is a linguistic ambiguity, that, despite of a great number of references, is not well elaborated due to its complexity and diversity of its representations in the languages.…”
Section: Introductionmentioning
confidence: 99%