Abstract:Semantic redundancies are frequently reported in practice and cause increased efforts for development and maintenance. However, instances are hard to find with existing approaches that tend to deliver a daunting number of imprecise findings for this specific problem.Can these issues be mitigated by combining different detection techniques? In this paper, we investigate whether a combination of clone detection and latent semantic indexing improves the detection of candidate re-implementations. We evaluate the c… Show more
“…We relate our findings to the following work:[9,11,12,13,21,22,25,28,31,33,34,39,40,41,43,47,51,53,54,56,60,67,69,71,72,77,79,81,82,83,84,88,91].…”
mentioning
confidence: 76%
“…In addition, we accounted for the different company contexts and philosophies, as mentioned above. 13 Coding means "categorising segments of data with a short name that simultaneously summarises and accounts for each piece of data" [16].…”
Section: Data Collection and Analysis Proceduresmentioning
confidence: 99%
“…Related literature: Research has been addressing discovering and tracking redundancies in the form of code clones [79,60,41] and re-implementations [56,12,13]. At this point, several industrial tools exist that support structural (as opposed to semantic) detection approaches on an industrially viable scale [34].…”
Section: Rq2 -Comparing Effects and Context Factorsmentioning
Context: Reuse can improve productivity and maintainability in software development. Research has proposed a wide range of methods and techniques. Are these successfully adopted in practice? Objective: We propose a preliminary answer by integrating two in-depth empirical studies on software reuse at two large software-producing companies. Method: We compare and interpret the study results with a focus on reuse practices, effects, and context. Results: Both companies perform pragmatic reuse of code produced within the company, not leveraging other available artefacts. Reusable entities are retrieved from a central repository, if present. Otherwise, direct communication with trusted colleagues is crucial for access. Reuse processes remain implicit and reflect the development style. In a homogeneous infrastructure-supported context, participants strongly agreed on higher development pace and less maintenance effort as reuse benefits. In a heterogeneous context with fragmented infrastructure, these benefits did not materialize. Neither case reports statistically significant evidence of negative side effects of reuse nor inhibitors. In both cases, a lack of reuse led to duplicate implementations. Conclusion: Technological advances have improved the way reuse concepts can be applied in practice. Homogeneity in development process and tool support seem necessary preconditions. Developing and adopting adequate reuse strategies in heterogeneous contexts remains challenging.
“…We relate our findings to the following work:[9,11,12,13,21,22,25,28,31,33,34,39,40,41,43,47,51,53,54,56,60,67,69,71,72,77,79,81,82,83,84,88,91].…”
mentioning
confidence: 76%
“…In addition, we accounted for the different company contexts and philosophies, as mentioned above. 13 Coding means "categorising segments of data with a short name that simultaneously summarises and accounts for each piece of data" [16].…”
Section: Data Collection and Analysis Proceduresmentioning
confidence: 99%
“…Related literature: Research has been addressing discovering and tracking redundancies in the form of code clones [79,60,41] and re-implementations [56,12,13]. At this point, several industrial tools exist that support structural (as opposed to semantic) detection approaches on an industrially viable scale [34].…”
Section: Rq2 -Comparing Effects and Context Factorsmentioning
Context: Reuse can improve productivity and maintainability in software development. Research has proposed a wide range of methods and techniques. Are these successfully adopted in practice? Objective: We propose a preliminary answer by integrating two in-depth empirical studies on software reuse at two large software-producing companies. Method: We compare and interpret the study results with a focus on reuse practices, effects, and context. Results: Both companies perform pragmatic reuse of code produced within the company, not leveraging other available artefacts. Reusable entities are retrieved from a central repository, if present. Otherwise, direct communication with trusted colleagues is crucial for access. Reuse processes remain implicit and reflect the development style. In a homogeneous infrastructure-supported context, participants strongly agreed on higher development pace and less maintenance effort as reuse benefits. In a heterogeneous context with fragmented infrastructure, these benefits did not materialize. Neither case reports statistically significant evidence of negative side effects of reuse nor inhibitors. In both cases, a lack of reuse led to duplicate implementations. Conclusion: Technological advances have improved the way reuse concepts can be applied in practice. Homogeneity in development process and tool support seem necessary preconditions. Developing and adopting adequate reuse strategies in heterogeneous contexts remains challenging.
“…Retrieving a ranked list of clones is preferred over a full list of clone pairs in various contexts, such as finding similar code examples or searching for candidates for bug fixing (Ke et al, 2015). Code clone detectors that report a complete set of clones are not suitable for these tasks because a large number of clone pairs have to be manually investigated Bauer et al, 2016). In these circumstances, the user would only need a ranked list of top n cloned code fragments instead .…”
This paper presents a novel code clone search technique that is accurate, incremental, and scalable to hundreds of million lines of code. Our technique incorporates multiple code representations (i.e., a technique to transform code into various representations to capture different types of clones), query reduction (i.e., a technique to select clone search keywords based on their uniqueness), and a customised ranking function (i.e., a technique to allow a specific clone type to be ranked on top of the search results) to improve clone search performance. We implemented the technique in a clone search tool, called Siamese, and evaluated its search accuracy and scalability on three established clone data sets. Siamese offers the highest mean average precision of 95% and 99% on two clone benchmarks compared to seven state-of-the-art clone detection tools, and reported the largest number of Type-3 clones compared to three other code search engines. Siamese is scalable and can return cloned code snippets within 8 seconds for a code corpus of 365 million lines of code. Using an index of 130,719 GitHub projects, we demonstrate that Siamese's incremental indexing capability dramatically decreases the index preparation time for large-scale data sets with multiple releases of software projects. The paper discusses the applications of Siamese to facilitate software development and research with two use cases including online code clone detection and clone search with automated license analysis.
“…An approach has been proposed to examine if the differences present between the clones can be safely parameterized without causing any side-effects [32]. Another study has been presented in order to investigate whether a combination of clone detection and latent semantic indexing improves the detection of candidate re-implementations [4]. Another code clone search technique called Siamese has been used to improve clone search performance [20].…”
Fragments of source-code that are similar are known as code-clones and can cause many difficulties within software applications. As developers develop large-scale applications, code-clones can become more and more pervasive throughout the code-base. There are many proposed methods for detecting such clones in applications and in this paper, we present a novel method for code-clone detection in large-scale repositories. Our token-based code-clone detector, called Intelligent Clone Detection Tool (ICDT) can detect both exact and near-miss clones from large repositories. We present our method for detecting clones and then report the evaluation of ICDT using a large-scale code-clone benchmark, BigCloneEval. Lastly, we compare ICDT to other publicly available and state-ofthe-art tools. We find that ICDT is more than capable of finding code-clones in large-scale repositories to a high degree of accuracy. CCS CONCEPTS • Software and its engineering → Software configuration management and version control systems;Software maintenance tools;Formal software verification;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.