2007
DOI: 10.1109/tse.2007.70725
|View full text |Cite
|
Sign up to set email alerts
|

Comparison and Evaluation of Clone Detection Tools

Abstract: Abstract-Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
568
1
6

Year Published

2008
2008
2021
2021

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 640 publications
(591 citation statements)
references
References 30 publications
(50 reference statements)
0
568
1
6
Order By: Relevance
“…Although there is a large number of clone detectors, plagiarism detectors, and code similarity detectors invented in the research community, there are relatively few studies that compare and evaluate their performances. Bellon et al (2007) proposed a framework for comparing and evaluating 6 clone detectors, evaluated a large set of clone detection tools but only based on results obtained from the tools' published papers, Hage et al (2010) compare five plagiarism detectors against 17 code modifications, Burd and Bailey (2002) compare five clone detectors for preventive maintenance tasks, Biegel et al (2011) compare three code similarity measures to identify code that needs refactoring, Svajlenko and Roy (2016) developed and used a clone evaluation framework called BigCloneEval to evaluate 10 state-of-the-art clone detectors. Although these studies cover various goals of tool evaluation and cover the different types of code modification found in the chosen data sets, they suffer from two limitations: (1) the selected tools are limited to only a subset of clone or plagiarism detectors, and (2) the results are based on different data sets, so one cannot compare a tool's performance from one study to another tool's from another study.…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…Although there is a large number of clone detectors, plagiarism detectors, and code similarity detectors invented in the research community, there are relatively few studies that compare and evaluate their performances. Bellon et al (2007) proposed a framework for comparing and evaluating 6 clone detectors, evaluated a large set of clone detection tools but only based on results obtained from the tools' published papers, Hage et al (2010) compare five plagiarism detectors against 17 code modifications, Burd and Bailey (2002) compare five clone detectors for preventive maintenance tasks, Biegel et al (2011) compare three code similarity measures to identify code that needs refactoring, Svajlenko and Roy (2016) developed and used a clone evaluation framework called BigCloneEval to evaluate 10 state-of-the-art clone detectors. Although these studies cover various goals of tool evaluation and cover the different types of code modification found in the chosen data sets, they suffer from two limitations: (1) the selected tools are limited to only a subset of clone or plagiarism detectors, and (2) the results are based on different data sets, so one cannot compare a tool's performance from one study to another tool's from another study.…”
Section: Motivationmentioning
confidence: 99%
“…Examples of code similarity analysers using graph-based approaches are the ones invented by Krinke (2001), Komondoor and Horwitz (2001), Chae et al (2013) and Chen et al (2014). Although the tools demonstrate high precision and recall (Krinke 2001), they suffer scalability issues (Bellon et al 2007).…”
Section: Code Similarity Measurementmentioning
confidence: 99%
“…With this definition, we are based on the general definition of a code clone: ''two code fragments form a clone pair if they are similar enough according to a given definition of similarity'' (Bellon et al, 2007). Intuitively, what we are interested in is similar enough so that the clones are interesting for a developer of the system while changing it.…”
Section: Terminologymentioning
confidence: 99%
“…Juergens, Deissenboeck & Hummel (2010b) reported on an experiment to investigate the differences between syntactical/representational and semantic/behavioural similarities of Type-1 clone Similar code fragments except for variation in whitespace, layout and comments (Bellon et al, 2007) Type-2 clone Similar code fragments except for variation in identifiers, literals, types, whitespaces layouts and comments (Bellon et al, 2007) Type-3 clone Similar code fragments except that some statements may be added or deleted in addition to variation in identifiers, literals, types, whitespaces, layouts or comments (Bellon et al, 2007) Type-4 clone Two or more code fragments that perform the same computation but are implemented by different syntactic variants. (Roy, Cordy & Koschke, 2009) Functionally similar clone (FSC) Code fragments that provide a similar functionality w.r.t a given definition of similarity but can be implemented quite differently…”
Section: Related Workmentioning
confidence: 99%
“…Software metrics or code metrics are used to detect clones [22,28,23]. AST is considered to be more accurate than token-based comparison [3]. Baxter et al [2] detect code clones using abstract syntax tree.…”
Section: Related Workmentioning
confidence: 99%