2017
DOI: 10.5381/jot.2017.16.4.a1
|View full text |Cite
|
Sign up to set email alerts
|

XCorpus – An executable Corpus of Java Programs.

Abstract: Empirical studies on code require standardized datasets of significant size extracted from real-world programs in order to be reproducible and generalisable. We argue that there is a need for such data sets that are executable and can therefore be used for experiments using static and dynamic analysis. A harness for such a data set should have high coverage in order to facilitate the construction of comprehensive models of program execution.We present XCorpus, a set of 76 executable, real-world Java programs, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 27 publications
(12 citation statements)
references
References 10 publications
(12 reference statements)
0
12
0
Order By: Relevance
“…We aim at improving current automated testing tools in a way that they avoid the generation of smelly test suites. Furthermore, we aim at replicating the study taking into account testing tools that work on different programming languages as well as different datasets (e.g., the Xcorpus one [100]).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We aim at improving current automated testing tools in a way that they avoid the generation of smelly test suites. Furthermore, we aim at replicating the study taking into account testing tools that work on different programming languages as well as different datasets (e.g., the Xcorpus one [100]).…”
Section: Resultsmentioning
confidence: 99%
“…Furthermore, we conducted the experiments on a dataset composed of a large number of classes extracted from the SF110 dataset [26]. While previous research [7,24] widely exploited such a dataset, experimenting a different one (e.g., XCorpus [100]) would increases the generalizability of the results. This is part of our future agenda.…”
Section: Threats To External Validitymentioning
confidence: 99%
“…A number of prior publications selected and curated corpora of projects, for performance benchmarking [Blackburn et al 2006], for static analysis [Tempero et al 2010], for dynamic analysis [Dietrich et al 2017b], and for repository mining in general [Allamanis and Sutton 2013]. Lopes et al [2017] conducted a study to measure code duplication in GitHub.…”
Section: Code Corporamentioning
confidence: 99%
“…Standard datasets have been widely used to support research in many other areas of computer science. For instance, the programming language and software engineering communities use datasets such as DaCapo [17] and Qualitas Corpus/XCorpus [18], [19]…”
Section: Related Workmentioning
confidence: 99%
“…Standard datasets have been widely used to support research in many other areas of computer science. For instance, the programming language and software engineering communities use datasets such as DaCapo [17] and Qualitas Corpus/XCorpus [18], [19] for benchmarking and empirical studies on source code. Sourcerer [20] is an infrastructure for large-scale collection and analysis of open source code.…”
Section: Related Workmentioning
confidence: 99%