Clone detection in source code by frequent itemset techniques

Wahler,; Seipel,; Wolff,; Fischer, Monika

doi:10.1109/scam.2004.6

Cited by 86 publications

(66 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the area of code clone detection, there are tokenbased [19], [28], AST-based [14], [29], and semantics-based techniques [30]. Previous work [12], [31] performed comparisons between the existing code clone detection techniques.…”

Section: B Code Clone Detectionmentioning

confidence: 99%

CloCom: Mining existing source code for automatic comment generation

Wong

Liu

Tan

2015

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

133

View full text Add to dashboard Cite

Abstract-Code comments are an integral part of software development. They improve program comprehension and software maintainability. The lack of code comments is a common problem in the software industry. Therefore, it is beneficial to generate code comments automatically. In this paper, we propose a general approach to generate code comments automatically by analyzing existing software repositories. We apply code clone detection techniques to discover similar code segments and use the comments from some code segments to describe the other similar code segments. We leverage natural language processing techniques to select relevant comment sentences.In our evaluation, we analyze 42 million lines of code from 1,005 open source projects from GitHub, and use them to generate 359 code comments for 21 Java projects. We manually evaluate the generated code comments and find that only 23.7% of the generated code comments are good. We report to the developers the good code comments, whose code segments do not have an existing code comment. Amongst the reported code comments, seven have been confirmed by the developers as good and committable to the software repository while the rest await for developers' confirmation. Although our approach can generate good and committable comments, we still have to improve the yield and accuracy of the proposed approach before it can be used in practice with full automation.

show abstract

Section: B Code Clone Detectionmentioning

confidence: 99%

CloCom: Mining existing source code for automatic comment generation

Wong

Liu

Tan

2015

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

133

View full text Add to dashboard Cite

show abstract

“…Techniques that detect many clones (high recall) also return many code fragments which are not clones (lower precision). In turn, techniques with a high precision will usually have a lower recall [27].…”

Section: Related Workmentioning

confidence: 99%

“…An itemset is called frequent, if it occurs in a percentage that exceeds a certain given support count σ [27]:…”

Section: Sequential Pattern Miningmentioning

confidence: 99%

“…This includes textual approaches, lexical approaches, syntactic approaches, semantic approaches, among others. Most of them are oriented to a specific computer language and they range from high precision to low precision, and from high recall to low recall [27]. Little work was done to explore the potential of using data mining techniques in code clone detection.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Code Clone Detection using Sequential Pattern Mining

El-Matarawy¹,

El-Ramly²,

Bahgat³

2015

IJCA

View full text Add to dashboard Cite

This paper presents a new technique for clone detection using sequential pattern mining titled EgyCD. Over the last decade many techniques and tools for software clone detection have been proposed such as textual approaches, lexical approaches, syntactic approaches, semantic approaches …, etc. In this paper, we explore the potential of data mining techniques in clone detection. In particular, we developed a clone detection technique based on sequential pattern mining (SPM). The source code is treated as a sequence of transactions processed by the SPM algorithm to find frequent itemsets. We run three experiments to discover code clones of Type I, Type II and Type III and for plagiarism detection. We compared the results with other established code clone detectors. Our technique discovers all code clones in the source code and hence it is slower than the compared code clone detectors since they discover few code clones compared with EgyCD.

show abstract

“…Nahler et. al [33] gave the approach which convert the AST into XML and then by using data mining technique [1] it extract the clones. This approach was further refined by Evas & Fraser [15] to find near miss clones by using only AST leaves rather than the tree, but again it was not able to detect much of the exact clones.…”

Section: International Journal Of Computer Applications (0975 -8887) mentioning

confidence: 99%

Literature Survey of Clone Detection Techniques

Gupta¹,

Gupta²

2014

IJCA

View full text Add to dashboard Cite

Code clones are the codes which have same code in the system and so it is difficult to locate all the same codes in the system when any change is to be done. Researchers have proved that almost 70% of the effort done during maintenance is just because of the occurrence the clones in the system. A number of approaches had been given earlier to detect various types of clones [39]. This paper presents the systematic literature review of all the detection approaches researched so far. Along with it this paper also gives the advantages to implement them and also all the defects due to which they were not able to completely detect the clones. It also gives a novel approach to automatically detect the clones irrespective of the matter that whether the code is in same order or any statement has been inserted, deleted or modified in the code fragment.

show abstract

Clone detection in source code by frequent itemset techniques

Cited by 86 publications

References 11 publications

CloCom: Mining existing source code for automatic comment generation

CloCom: Mining existing source code for automatic comment generation

Code Clone Detection using Sequential Pattern Mining

Literature Survey of Clone Detection Techniques

Contact Info

Product

Resources

About