2017
DOI: 10.1007/s10664-017-9564-7
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of code similarity analysers

Abstract: Copying and pasting of source code is a common activity in software engineering. Often, the code is not copied as it is and it may be modified for various purposes; e.g. refactoring, bug fixing, or even software plagiarism. These code modifications could affect the performance of code similarity analysers including code clone and plagiarism detectors to some certain degree. We are interested in two types of code modification in this study: pervasive modifications, i.e. transformations that may have a global ef… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
58
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 97 publications
(58 citation statements)
references
References 90 publications
(132 reference statements)
0
58
0
Order By: Relevance
“…It is true that treating source code as text enables the detection of cross-language plagiarism and collusion with minimal effort. Nevertheless, this treatment may reduce the detection accuracy [44]; the source code can be inaccurately parsed since source code grammars are different from text grammars. For instance, statement countMAX+=1; can be considered to be one word according to text grammars since no spaces are involved between the tokens.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It is true that treating source code as text enables the detection of cross-language plagiarism and collusion with minimal effort. Nevertheless, this treatment may reduce the detection accuracy [44]; the source code can be inaccurately parsed since source code grammars are different from text grammars. For instance, statement countMAX+=1; can be considered to be one word according to text grammars since no spaces are involved between the tokens.…”
Section: Related Workmentioning
confidence: 99%
“…Some detection techniques have addressed the issue by considering the source code as raw text [7,50], removing any needs for language-specific components. Even though this kind of approach is applicable, it may lack effectiveness [43,44]. Occasionally, a given source code can be inaccurately tokenized since text grammars are different from source code grammars.…”
Section: Introductionmentioning
confidence: 99%
“…The first data set, called the generated data set, is used in our previous study of comparing 30 code similarity analysers [19]. It contains 100 Java source code files with pervasive code modifications.…”
Section: E Data Setsmentioning
confidence: 99%
“…It preprocesses source code before detecting clones by using pretty-printing, variable renaming, and code abstraction. We chose NiCad because it has been used in several clone studies [19], [26], [22] and it reports clones at method-level, similar to Vincent. Both Vincent and NiCad were configured with the default configurations.…”
Section: G Experimental Designmentioning
confidence: 99%
See 1 more Smart Citation