Combining Clone Detection and Latent Semantic Indexing to Detect Re-implementations

Bauer, Veronika; Eder, Sebastian

doi:10.1109/saner.2016.26

Cited by 10 publications

(10 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We relate our findings to the following work:[9,11,12,13,21,22,25,28,31,33,34,39,40,41,43,47,51,53,54,56,60,67,69,71,72,77,79,81,82,83,84,88,91].…”

mentioning

confidence: 76%

“…In addition, we accounted for the different company contexts and philosophies, as mentioned above. 13 Coding means "categorising segments of data with a short name that simultaneously summarises and accounts for each piece of data" [16].…”

Section: Data Collection and Analysis Proceduresmentioning

confidence: 99%

“…Related literature: Research has been addressing discovering and tracking redundancies in the form of code clones [79,60,41] and re-implementations [56,12,13]. At this point, several industrial tools exist that support structural (as opposed to semantic) detection approaches on an industrially viable scale [34].…”

Section: Rq2 -Comparing Effects and Context Factorsmentioning

confidence: 99%

See 2 more Smart Citations

Comparing reuse practices in two large software-producing companies

Bauer

Vetrò

2016

Journal of Systems and Software

Self Cite

View full text Add to dashboard Cite

Context: Reuse can improve productivity and maintainability in software development. Research has proposed a wide range of methods and techniques. Are these successfully adopted in practice? Objective: We propose a preliminary answer by integrating two in-depth empirical studies on software reuse at two large software-producing companies. Method: We compare and interpret the study results with a focus on reuse practices, effects, and context. Results: Both companies perform pragmatic reuse of code produced within the company, not leveraging other available artefacts. Reusable entities are retrieved from a central repository, if present. Otherwise, direct communication with trusted colleagues is crucial for access. Reuse processes remain implicit and reflect the development style. In a homogeneous infrastructure-supported context, participants strongly agreed on higher development pace and less maintenance effort as reuse benefits. In a heterogeneous context with fragmented infrastructure, these benefits did not materialize. Neither case reports statistically significant evidence of negative side effects of reuse nor inhibitors. In both cases, a lack of reuse led to duplicate implementations. Conclusion: Technological advances have improved the way reuse concepts can be applied in practice. Homogeneity in development process and tool support seem necessary preconditions. Developing and adopting adequate reuse strategies in heterogeneous contexts remains challenging.

show abstract

“…We relate our findings to the following work:[9,11,12,13,21,22,25,28,31,33,34,39,40,41,43,47,51,53,54,56,60,67,69,71,72,77,79,81,82,83,84,88,91].…”

mentioning

confidence: 76%

Section: Data Collection and Analysis Proceduresmentioning

confidence: 99%

Section: Rq2 -Comparing Effects and Context Factorsmentioning

confidence: 99%

See 1 more Smart Citation

Comparing reuse practices in two large software-producing companies

Bauer

Vetrò

2016

Journal of Systems and Software

Self Cite

View full text Add to dashboard Cite

show abstract

“…Retrieving a ranked list of clones is preferred over a full list of clone pairs in various contexts, such as finding similar code examples or searching for candidates for bug fixing (Ke et al, 2015). Code clone detectors that report a complete set of clones are not suitable for these tasks because a large number of clone pairs have to be manually investigated Bauer et al, 2016). In these circumstances, the user would only need a ranked list of top n cloned code fragments instead .…”

Section: Background and Motivationmentioning

confidence: 99%

Siamese: scalable and incremental code clone search via multiple code representations

Ragkhitwetsagul

Krinke

2019

Empir Software Eng

View full text Add to dashboard Cite

This paper presents a novel code clone search technique that is accurate, incremental, and scalable to hundreds of million lines of code. Our technique incorporates multiple code representations (i.e., a technique to transform code into various representations to capture different types of clones), query reduction (i.e., a technique to select clone search keywords based on their uniqueness), and a customised ranking function (i.e., a technique to allow a specific clone type to be ranked on top of the search results) to improve clone search performance. We implemented the technique in a clone search tool, called Siamese, and evaluated its search accuracy and scalability on three established clone data sets. Siamese offers the highest mean average precision of 95% and 99% on two clone benchmarks compared to seven state-of-the-art clone detection tools, and reported the largest number of Type-3 clones compared to three other code search engines. Siamese is scalable and can return cloned code snippets within 8 seconds for a code corpus of 365 million lines of code. Using an index of 130,719 GitHub projects, we demonstrate that Siamese's incremental indexing capability dramatically decreases the index preparation time for large-scale data sets with multiple releases of software projects. The paper discusses the applications of Siamese to facilitate software development and research with two use cases including online code clone detection and clone search with automated license analysis.

show abstract

“…An approach has been proposed to examine if the differences present between the clones can be safely parameterized without causing any side-effects [32]. Another study has been presented in order to investigate whether a combination of clone detection and latent semantic indexing improves the detection of candidate re-implementations [4]. Another code clone search technique called Siamese has been used to improve clone search performance [20].…”

Section: Literature Reviewmentioning

confidence: 99%

Intelligent token-based code clone detection system for large scale source code

Elkhail

Svacina

Černý

2019

Proceedings of the Conference on Research in Adaptive and Convergent Systems

View full text Add to dashboard Cite

Fragments of source-code that are similar are known as code-clones and can cause many difficulties within software applications. As developers develop large-scale applications, code-clones can become more and more pervasive throughout the code-base. There are many proposed methods for detecting such clones in applications and in this paper, we present a novel method for code-clone detection in large-scale repositories. Our token-based code-clone detector, called Intelligent Clone Detection Tool (ICDT) can detect both exact and near-miss clones from large repositories. We present our method for detecting clones and then report the evaluation of ICDT using a large-scale code-clone benchmark, BigCloneEval. Lastly, we compare ICDT to other publicly available and state-ofthe-art tools. We find that ICDT is more than capable of finding code-clones in large-scale repositories to a high degree of accuracy. CCS CONCEPTS • Software and its engineering → Software configuration management and version control systems;Software maintenance tools;Formal software verification;

show abstract

Combining Clone Detection and Latent Semantic Indexing to Detect Re-implementations

Cited by 10 publications

References 30 publications

Comparing reuse practices in two large software-producing companies

Comparing reuse practices in two large software-producing companies

Siamese: scalable and incremental code clone search via multiple code representations

Intelligent token-based code clone detection system for large scale source code

Contact Info

Product

Resources

About