Software clone detection techniques identify fragments of code that share some level of syntactic similarity. In this study, we investigate security-sensitive clone clusters: clusters of syntactically similar fragments of code that are protected by some privileges. From a security perspective, security-sensitive clone clusters can help reason about the implemented security model: given syntactically similar fragments of code, it is expected that they are protected by similar privileges. We hypothesize that clones that violate this assumption, defined as security-discordant clones, are likely to reveal weaknesses and flaws in access control models.In order to characterize security-discordant clones, we investigated two of the largest and most popular open-source PHP applications: Joomla! and Moodle, with sizes ranging from hundred thousands to more than a million lines of code. Investigation of security-discordant clone clusters in these systems revealed several previously undocumented, recurring, and application-independent security weaknesses. Moreover, security-discordant clones also revealed four, previously unreported, security flaws. Results also show how these flaws were revealed through the investigation of as little as 2% of the code base. Distribution of weaknesses and flaws between the two systems is investigated and discussed. Potential extensions to this exploratory work are also presented.
Clone detection techniques quality and performance evaluation require a system along with its clone oracle, that is a reference database of all accepted clones in the investigated system. Many challenges, including finding an adequate clone definition and scalability to industrial size systems, must be overcome to create good oracles. This paper presents an original method to construct clone oracles based on the Levenshtein metric. Although other oracles exist, this is the largest known oracle for type-3 clones that was created by an automated process on massive data sets. The method behind the creation of the oracle as well as actual oracles characteristics are presented. Discussion of the results in relation to other ways of building oracles is also provided along with future research possibilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.