Rocco Oliveto scite author profile

The main drawback of existing software artifact management systems is the lack of automatic or semi-automatic traceability link generation and maintenance. We have improved an artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique. We have assessed LSI to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach. The method and the tool have been evaluated during the development of seventeen software projects involving about 150 students. We observed that although tools based on information retrieval provide a useful support for the identification of traceability links during software development, they are still far to support a complete semi-automatic recovery of all links. The results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts.

show abstract

How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms

Panichella

et al. 2013

View full text Add to dashboard Cite

Abstract-Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results.Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is able to identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.

show abstract

An experimental investigation on the innate relationship between quality and refactoring

Bavota

Lucia

Penta

et al. 2015

Journal of Systems and Software

198

203

View full text Add to dashboard Cite

On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation

et al. 2017

View full text Add to dashboard Cite

Code smells are symptoms of poor design and implementation choices that may hinder code comprehensibility and maintainability. Despite the effort devoted by the research community in studying code smells, the extent to which code smells in software systems affect software maintainability remains still unclear. In this paper we present a large scale empirical investigation on the diffuseness of code smells and their impact on code change-and fault-proneness. The study was conducted across a total of 395 releases of 30

show abstract

Automatic query reformulations for text retrieval in software engineering

Haiduc¹,

Bavota²,

Marcus³

et al. 2013

178

214

View full text Add to dashboard Cite

Abstract-There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place.We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rocco Oliveto

Recovering traceability links in software artifact management systems using information retrieval methods

How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms

An experimental investigation on the innate relationship between quality and refactoring

On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation

Automatic query reformulations for text retrieval in software engineering

Contact Info

Product

Resources

About