One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approach based on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics and is a satisfactory starting point for further manual refinement of topics.
There exist a number of large legacy systems that still undergo continuous maintenance and enhancement. Due to the sheer size and complexity of the software systems and limited resources, managers are confronted with crucial decisions regarding allocation and training of new engineers, intelligent allocation of testing personnel, assessment of release readiness of the software and so on. While the area of bug prediction by mining software repositories holds promise, and is a worthwhile endeavor, the current state of the art techniques are not accurate enough in predicting bugs and hence are of limited usefulness to managers. So instead of predicting files as buggy or not we take a different viewpoint and focus on providing decision support for managers. In this paper we present a set of metrics to guide the managers in taking these decisions. These metrics are evaluated using 4 open source systems and 2 proprietary systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.