2013
DOI: 10.1002/smr.1592
|View full text |Cite
|
Sign up to set email alerts
|

Large‐scale inter‐system clone detection using suffix trees and hashing

Abstract: Detecting a similar code between two systems has various applications such as comparing two software variants or versions or finding potential license violations. Techniques detecting suspiciously similar code must scale in terms of resources needed to very large code corpora and need to have high precision because a human needs to inspect the results. This paper demonstrates how suffix trees can be used to obtain a scalable comparison. The evaluation is carried out for very large code corpora. Our evaluation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 35 publications
0
10
0
Order By: Relevance
“…They can be classified into two categories, fine-grained detection [17,23,27,39,41] and unit-level detection [19,35,40].…”
Section: Identifying Code Reuse Between Different Projectsmentioning
confidence: 99%
“…They can be classified into two categories, fine-grained detection [17,23,27,39,41] and unit-level detection [19,35,40].…”
Section: Identifying Code Reuse Between Different Projectsmentioning
confidence: 99%
“…Dedicated code search engines such as BlackDuck OpenHub (BlackDuck, 2016), Krugle (Aragon Consulting Group, Inc., 2018) or Searchcode (Boyter, Ben, 2018) cannot efficiently handle code clones with modifications . Hummel et al (2010) and Koschke (2014) are among the first to propose scalable clone detection systems. However, the trade-off for the scalability is their ability to report only copy-and-paste clones or clones with variable renaming (i.e., Type-1 and Type-2 clones), while the largest number of clones found in software are clones with added or deleted statements (i.e., Type-3 clones) .…”
Section: Background and Motivationmentioning
confidence: 99%
“…Thus, adding new projects to the code base under analysis or updating existing projects would result in the need to rerun the clone detection for the complete data set. Several of the proposed techniques that support incremental clone detection do not scale to large-scale data sets (Göde and Koschke, 2009;Kawaguchi et al, 2009;Nguyen et al, 2009) or do not detect Type-3 clones in sacrificing for scalability (Hummel et al, 2010;Koschke, 2014).…”
Section: Background and Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…After a hash value is computed for each input source file, code clones are retrieved from the databases where the hash values are stored. Koschke proposed code clone detection approach using suffix tree and MD5 hash function [20]. The goal of his research is to detect code clones between a subject systems and a set of other systems for finding potential license violations.…”
Section: Related Workmentioning
confidence: 99%