“…Although our work with the iLSI algorithm is closely related to the work reported in [21], there are two very important differences between how this algorithm was studied in [21] and how we use it here. Our work makes explicit the limitation that the iLSI algorithm is incapable of incorporating new information (source files and terms) as a software library evolves.…”
Section: Introductionmentioning
confidence: 99%
“…• iLSI -This algorithm was proposed by Jiang et al [21] to incrementally update the LSA model of a dynamic collection of source files and related documentation for the purpose of search-based automated traceability link recovery.…”
Section: Introductionmentioning
confidence: 99%
“…In other words, we show that the iLSI algorithm is not the best choice for incrementally updating the LSA model of an evolving software repository. Secondly, Jiang et al's [21] experimental validation consists of just two consecutive releases of the software libraries they worked with. In contrast, our experiments are based on commit-level information tracked over 10 years of commit history of the software libraries on which we have reported our results.…”
The problem of bug localization is to identify the source files related to a bug in a software repository. Information Retrieval (IR) based approaches create an index of the source files and learn a model which is then queried with a bug for the relevant files. In spite of the advances in these tools, the current approaches do not take into consideration the dynamic nature of software repositories. With the traditional IR based approaches to bug localization, the model parameters must be recalculated for each change to a repository. In contrast, this paper presents an incremental framework to update the model parameters of the Latent Semantic Analysis (LSA) model as the data evolves. We compare two state-of-the-art incremental SVD update techniques for LSA with respect to the retrieval accuracy and the time performance. The dataset we used in our validation experiments was created from mining 10 years of version history of AspectJ and JodaTime software libraries.
“…Although our work with the iLSI algorithm is closely related to the work reported in [21], there are two very important differences between how this algorithm was studied in [21] and how we use it here. Our work makes explicit the limitation that the iLSI algorithm is incapable of incorporating new information (source files and terms) as a software library evolves.…”
Section: Introductionmentioning
confidence: 99%
“…• iLSI -This algorithm was proposed by Jiang et al [21] to incrementally update the LSA model of a dynamic collection of source files and related documentation for the purpose of search-based automated traceability link recovery.…”
Section: Introductionmentioning
confidence: 99%
“…In other words, we show that the iLSI algorithm is not the best choice for incrementally updating the LSA model of an evolving software repository. Secondly, Jiang et al's [21] experimental validation consists of just two consecutive releases of the software libraries they worked with. In contrast, our experiments are based on commit-level information tracked over 10 years of commit history of the software libraries on which we have reported our results.…”
The problem of bug localization is to identify the source files related to a bug in a software repository. Information Retrieval (IR) based approaches create an index of the source files and learn a model which is then queried with a bug for the relevant files. In spite of the advances in these tools, the current approaches do not take into consideration the dynamic nature of software repositories. With the traditional IR based approaches to bug localization, the model parameters must be recalculated for each change to a repository. In contrast, this paper presents an incremental framework to update the model parameters of the Latent Semantic Analysis (LSA) model as the data evolves. We compare two state-of-the-art incremental SVD update techniques for LSA with respect to the retrieval accuracy and the time performance. The dataset we used in our validation experiments was created from mining 10 years of version history of AspectJ and JodaTime software libraries.
“…One such contribution is the Incremental Latent Semantic Indexing (LSI) algorithm for search based automatic traceability link recovery proposed by Jiang et al [34]. In this paper, the authors propose an incremental approach based on LSA model to update the links between the source code files and the documentation as they both evolve.…”
Section: B Improvements In Retrieval Efficiencymentioning
Abstract-Information Retrieval (IR) based bug localization techniques use a bug reports to query a software repository to retrieve relevant source files. These techniques index the source files in the software repository and train a model which is then queried for retrieval purposes. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As the software repository evolves, the index creation and model learning have to be repeated to ensure accuracy of retrieval for each new bug. In doing so, the query latency may be unreasonably high, and also, re-computing the index and the model for files that did not change is computationally redundant. We propose an incremental update framework to continuously update the index and the model using the changes made at each commit. We demonstrate that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. Our results are based on two basic IR modeling techniques -Vector Space Model (VSM) and Smoothed Unigram Model (SUM). The dataset we used in our validation experiments was created by tracking commit history of AspectJ and JodaTime software libraries over a span of 10 years.
“…Some of the software engineering problems, related to concept location, which have been addressed using LSI are: traceability link recovery between source code and documentation [De Lucia et al 2007;Jiang et al 2008;Marcus et al 2005a], tracing requirements [Hayes et al 2006;Lo et al 2006] and other software artifacts [Lormans and Van Deursen 2006], identifying clones in software [Marcus and Maletic 2001;Tairas and Gray 2009], retrieving relevant artifacts in project histories [Cubranic et al 2005], measuring coupling ] and cohesion [De Lucia et al 2008;Marcus et al 2008] of classes. In these applications, the documents are formed using the source code (that is, a document can be a class, method, function, package, etc.)…”
________________________________________________________________________The paper addresses the problem of concept location in source code by proposing an approach that combines Formal Concept Analysis and Information Retrieval. In the proposed approach, Latent Semantic Indexing, an advanced Information Retrieval approach, is used to map textual descriptions of software features or bug reports to relevant parts of the source code, presented as a ranked list of source code elements. Given the ranked list, the approach selects the most relevant attributes from the best ranked documents, clusters the results, and presents them as a concept lattice, generated using Formal Concept Analysis.The approach is evaluated through a large case study on concept location in the source code on six opensource systems, using several hundred features and bugs. The empirical study focuses on the analysis of various configurations of the generated concept lattices and the results indicate that our approach is effective in organizing different concepts and their relationships present in the subset of the search results. In consequence, the proposed concept location method has been shown to outperform a standalone Information Retrieval based concept location technique by reducing the number of irrelevant search results across all the systems and lattice configurations evaluated, potentially reducing the programmers' effort during software maintenance tasks involving concept location.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.