2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) 2015
DOI: 10.1109/saner.2015.7081830
|View full text |Cite
|
Sign up to set email alerts
|

Threshold-free code clone detection for a large-scale heterogeneous Java repository

Abstract: Code clones are unavoidable entities in software ecosystems. A variety of clone-detection algorithms are available for finding code clones. For Type-3 clone detection at method granularity (i.e., similar methods with changes in statements), dissimilarity threshold is one of the possible configuration parameters. Existing approaches use a single threshold to detect Type-3 clones across a repository. However, our study shows that to detect Type-3 clones at method granularity on a large-scale heterogeneous reposi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
12
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(13 citation statements)
references
References 35 publications
1
12
0
Order By: Relevance
“…An empirical study of using compilation/decompilation to enhance the performance of clone detection tool in three real-world system found similar results to our study (Ragkhitwetsagul and Krinke 2017b). Keivanloo et al (2015) discussed the problem of using a single threshold for clone detection over several repositories and propose a solution using threshold-free clone detection based on unsupervised learning. The method mainly utilises k-means clustering with the Friedman quality optimisation method.…”
Section: Related Worksupporting
confidence: 81%
See 1 more Smart Citation
“…An empirical study of using compilation/decompilation to enhance the performance of clone detection tool in three real-world system found similar results to our study (Ragkhitwetsagul and Krinke 2017b). Keivanloo et al (2015) discussed the problem of using a single threshold for clone detection over several repositories and propose a solution using threshold-free clone detection based on unsupervised learning. The method mainly utilises k-means clustering with the Friedman quality optimisation method.…”
Section: Related Worksupporting
confidence: 81%
“…The configuration problem for clone detection tools including setting thresholds has been mentioned by several studies as one of the threats to validity (Wang et al 2001). There has also been an initiative to avoid using thresholds at all for clone detection (Keivanloo et al 2015). Hence, we try to avoid the problem of threshold sensitivity affecting our results.…”
Section: Scenario 4 (Ranked Results)mentioning
confidence: 99%
“…The code clone pairs were then manually validated for the training phase of the proposed method. As some recent research shows that the clone validation decision in some scenario depends on user's perspective [25], that is given a possible code clone pair to validate some judges might decide it to be a true positive clone pairs where others might say the opposite (especially in case of Type 3 and Type 4 clones). So to consider this generalization to the proposed method the whole set of code pairs were split into five parts to be validated by five different graduate research students from computer science background.…”
Section: High-level Details Of the Data Setmentioning
confidence: 99%
“…However, it is not clear if duplicate logging statements are indeed associated with code clones. Moreover, the performance of clone detection tools is dependent on the thresholds [54,34,50,40]. Choosing the optimal thresholds is a non-trivial task and the value may differ across systems [62].…”
Section: Construct Validitymentioning
confidence: 99%