2006
DOI: 10.1002/spe.750
|View full text |Cite
|
Sign up to set email alerts
|

Efficient plagiarism detection for large code repositories

Abstract: Unauthorized re‐use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time‐consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text‐based plagiarism detection systems capable of working with large collections, this is not the case for code‐based plagiarism detection. In this paper, we propose techniques for detecting plagiar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
83
0
1

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 84 publications
(84 citation statements)
references
References 14 publications
0
83
0
1
Order By: Relevance
“…It has been defined as one of the code cloning patterns by (Kapser 2006;Kapser and Godfrey 2008). Boiler-plate code can be found when building device drivers for operating systems (Baxter et al 1998), developing android applications , and giving programming assignments (Burrows et al 2007;Schleimer et al 2003). Boiler-plate code usually contains small code modifications in order to adapt the boiler-plate code to a new environment.…”
Section: Source Code Modificationsmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been defined as one of the code cloning patterns by (Kapser 2006;Kapser and Godfrey 2008). Boiler-plate code can be found when building device drivers for operating systems (Baxter et al 1998), developing android applications , and giving programming assignments (Burrows et al 2007;Schleimer et al 2003). Boiler-plate code usually contains small code modifications in order to adapt the boiler-plate code to a new environment.…”
Section: Source Code Modificationsmentioning
confidence: 99%
“…Different similarity measurements such as suffix trees, string alignment, Jaccard similarity, etc., can be applied to sequences or sets of tokens. Tools that rely on tokens include Sherlock (Joy and Luck 1999), BOSS (Joy et al 2005), Sim (Gitchell and Tran 1999), YAP3 (Wise 1996), JPlag (Prechelt et al 2002), CCFinder (Kamiya et al 2002), CP-Miner (Li et al 2006), MOSS (Schleimer et al 2003), Burrows et al (2007), and the Source Code Similarity Detector System (SCSDS) (Duric and Gasevic 2013). The token-based representation is widely used in source code similarity measurement and very efficient on a scale of millions SLOC.…”
Section: Code Similarity Measurementmentioning
confidence: 99%
“…These approaches are text-based, attribute-based, and structure-based approach [4,5]. Text-based approach is the only approach which is programming-independent since it treats source code as raw text.…”
Section: Related Workmentioning
confidence: 99%
“…In general, there are three major approaches for detecting source code plagiarism: text-based, attribute-based, and structure-based approach [4,5]. Text-based approach determines similarity by considering source code as a raw text; attribute-based approach determines similarity based on source code attributes (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…However, it is common that unethical students plagiarize other people's work. For example, an Australian survey conducted in 2002 reported that 85% of a class at Monash University, and 70% of a class at Swinburne University, engaged in plagiarism during their studies [3].…”
Section: Introductionmentioning
confidence: 99%