Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering 2014
DOI: 10.1145/2635868.2635886
|View full text |Cite
|
Sign up to set email alerts
|

How should we measure functional sameness from program source code? an exploratory study on Java methods

Abstract: Program source code is one of the main targets of software engineering research. A wide variety of research has been conducted on source code, and many studies have leveraged structural, vocabulary, and method signature similarities to measure the functional sameness of source code. In this research, we conducted an empirical study to ascertain how we should use three similarities to measure functional sameness. We used two large datasets and measured the three similarities between all the method pairs in the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 50 publications
0
6
0
Order By: Relevance
“…There are a variety of approaches to compute structural similarity of source code [49,50,51]. We used the de-duplication approach explained in [52] for its simplicity and effectiveness for method-level similarity computation.…”
Section: Problem Statementmentioning
confidence: 99%
“…There are a variety of approaches to compute structural similarity of source code [49,50,51]. We used the de-duplication approach explained in [52] for its simplicity and effectiveness for method-level similarity computation.…”
Section: Problem Statementmentioning
confidence: 99%
“…To cope with this problem, they have randomly selected 100 Java projects from Source-Forge 5 in what was called a first attempt to run a statistically sound experiment in test data generation. According to them, the resulting benchmark, called SF100, is statistically sound and representative of open source projects 6 . This issue also affects the evaluation of code search techniques: many times the target repositories are not a random sample of software projects, but a specific set of projects, which can introduce bias to the study.…”
Section: A Repositorymentioning
confidence: 99%
“…Although such a property has received a lot of attention in the past, recently researchers have been looking at other types of replication, such as vocabulary or temporal redundancy. The idea is that sometimes two fragments of code can be similar with respect to other aspects besides text (think about different implementations of sorting algorithms, for instance, which can have the same function but different structure [6]). Vocabulary redundancy appears when different pieces of code share similar words [6], while temporal redundancy is concerned with the amount of code commits that are composed of previous commits [7].…”
Section: Introductionmentioning
confidence: 99%
“…Beyond this (of course fuzzy) threshold, the diversity and uniqueness of source code appears. Higo and Kusumoto [11] investigate the interplay between structural similarity, vocabulary similarity and method name similarity, to assess functional similarity between methods in Java programs. They show that many contextual factors influence the ability of these similarity measures to spot functional similarity (e.g., the number of methods that share the same name, or the fact that two methods with similar structure are in the same class or not).…”
Section: Related Workmentioning
confidence: 99%