A complete and accurate cross-language clone detection tool can support software forking process that reuses the more reliable algorithms of legacy systems from one language code base to other. Cross-language clone detection also helps in building code recommendation system. This paper proposes a new technique to detect and classify cross-language clones of C and C++ programs by filtering the nodes of ANTLR-generated parse tree using a common grammar file, CPP14.g4. Parsing the input files using CPP14.g4 provides all the lexical and semantic information of input source code. Selective filtering of nodes performs serialization of two parse trees. Vector representation using term frequency inverse document frequency (TF-IDF) of the resultant tree is given as an input to cosine similarity to classify the clone types. Filtered parse tree of C and C++ increases the precision from 51% to 61%, and matching based on renaming the input/output expressions provides average precision of 91.97% and 95.37% for small scale and large scale repositories respectively. The proposed cross-language clone detection exhibits the highest precision of 95.37% in finding all types of clones (1, 2, 3 and 4) for 16,032 semantically similar clone pairs of C and CPP codes.
Code clone detection plays a vital role in both industry and academia. Last three decades have seen more than 250 clone detection techniques with lack of single framework that can detect and classify all 4 basic types of code clones with high precision. This serious lack of clone classification impacts largely on the universities and online learning platforms that fail to validate the projects or coding assignments submitted online. In this paper, we propose a complete and language agnostic technique to detect and classify all 4 clone types of C, C++, and Java programs. The method first generates the parse tree then extracts the functional tree to eliminate the need for the preprocessing stage employed by previous clone detection techniques. The generated parse tree contains all the necessary information for detecting code clones. We employ TF-IDF cosine similarity for the proper classification of clone types. The proposed technique achieves incredible precision rate of 100% in detecting the first two types of clones and 98% precision in detecting type-3 and type-4 clones for small codes of C, C++, and Java containing an average line count of 5. The proposed technique outperforms the existing tree-based clone detection tools by providing the average precision of 98.07% on the C, C++, and Java programs crawled from Github with an average line count of 15 which signifies that cosine similarity measure on ANTLR functional tree accurately detects all 4 types of small clones and act as proper validation tools for identifying the learning level in the submitted programming assignment.
<p>The "Smart Glasses" are made to make it easier for blind persons to read and decipher written English-language content. A blind people find it exceedingly challenging to travel alone, and they run the danger of getting lost and regular sticks won't allow the person to go around independently in public without things growing worse. The objective of our work is to help blind people communicate more easily by developing a smart assistive glasses using artificial intelligence. The function of the glasses is to read out any text picture as audio text, which can then be heard through a headset attached to the spectacles. OpenCV, optical character recognition and efficient and accurate scene text (EAST) detector were used to identify the text in the image; ultrasonic sensor in the glasses is used to calculate the distance to snap a clear picture. The motion sensor directs the blind to the lecture halls, classrooms, and laboratory locations using an radio frequency identification reader. The results shows that combination of optical character recognizer and EAST detector produced a fairly accurate result, demonstrating the potential of the glasses to recognize the text. Currently, the language supported by the glasses is English, and the distance covered is 40 to 150 cm.</p>
In spite of significant research done in the past 3 decades introducing more than 250 clone detection tools/ techniques for finding the same language clones, there exists no single framework to detect and classify all 4 basic types of clones with great accuracy (precision and recall). In this paper, we propose an accurate and language agnostic technique to classify 4 types of clones. The method first generates an ANTLR parse tree for the input program file using freely available ANTLR grammar files then finds the edit distance between the two parse trees using the Levenshtein distance algorithm and converts the edit distance into similarity using. We obtained 100% precision and recall in detecting type 1 & 2 clone types and achieve 98.50 and 98.12 respectively for type 3 and 4 clone types for our datasets containing microprograms of C, CPP, and Java. This paper provides evidence that the Levenshtein distance on ANTLR parse tree is the good choice to build a complete and accurate software clone detector and act as proper validation tools to detect code plagiarism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.