In order to increase our ability to use measurement to support software development practise we need to do more analysis of code. However, empirical studies of code are expensive and their results are difficult to compare. We describe the Qualitas Corpus, a large curated collection of open source Java systems. The corpus reduces the cost of performing large empirical studies of code and supports comparison of measurements of the same artifacts. We discuss its design, organisation, and issues associated with its development.
Large amounts of Java software have been written since the language's escape into unsuspecting software ecology more than ten years ago. Surprisingly little is known about the structure of Java programs in the wild: about the way methods are grouped into classes and then into packages, the way packages relate to each other, or the way inheritance and composition are used to put these programs together. We present the results of the first in-depth study of the structure of Java programs. We have collected a number of Java programs and measured their key structural attributes. We have found evidence that some relationships follow
power-laws
, while others do not. We have also observed variations that seem related to some characteristic of the application itself. This study provides important information for researchers who can investigate how and why the structural relationships we find may have originated, what they portend, and how they can be managed.
Identifying and correcting syntax errors is a challenge all novice programmers confront. As educators, the more we understand about the nature of these errors and how students respond to them, the more effective our teaching can be. It is well known that just a few types of errors are far more frequently encountered by students learning to program than most. In this paper, we examine how long students spend resolving the most common syntax errors, and discover that certain types of errors are not solved any more quickly by the higher ability students. Moreover, we note that these errors consume a large amount of student time, suggesting that targeted teaching interventions may yield a significant payoff in terms of increasing student productivity.
Large amounts of Java software have been written since the language's escape into unsuspecting software ecology more than ten years ago. Surprisingly little is known about the structure of Java programs in the wild: about the way methods are grouped into classes and then into packages, the way packages relate to each other, or the way inheritance and composition are used to put these programs together. We present the results of the first in-depth study of the structure of Java programs. We have collected a number of Java programs and measured their key structural attributes. We have found evidence that some relationships follow power-laws, while others do not. We have also observed variations that seem related to some characteristic of the application itself. This study provides important information for researchers who can investigate how and why the structural relationships we find may have originated, what they portend, and how they can be managed.
Advocates of the design principle avoid cyclic dependencies among modules have argued that cycles are detrimental to software quality attributes such as understandability, testability, reusability, buildability and maintainability, yet folklore suggests such cycles are common in real object-oriented systems. In this paper we present the first significant empirical study of cycles among the classes of 78 open-and closed-source Java applications. We find that, of the applications comprising enough classes to support such a cycle, about 45% have a cycle involving at least 100 classes and around 10% have a cycle involving at least 1,000 classes. We present further empirical evidence to support the contention these cycles are not due to intrinsic interdependencies between particular classes in a domain. Finally, we attempt to gauge the strength of connection among the classes in a cycle using the concept of a minimum edge feedback set.
Mastering syntax is one of the earliest challenges facing the novice programmer. Problem solving and algorithms are the focus of many first year programming classes, leaving students to learn syntax on their own while they practice writing code. In this paper we investigate the frequency with which students encounter syntax errors during a drill and practice activity. We find that students struggle with syntax to a greater extent than we anticipated, even when writing short fragments of code.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.