Sonia Haiduc scite author profile

Abstract-There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place.We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

show abstract

Supporting program comprehension with source code summarization

Haiduc

2010

View full text Add to dashboard Cite

One of the main challenges faced by today's developers is keeping up with the staggering amount of source code that needs to be read and understood. In order to help developers with this problem and reduce the costs associated with it, one solution is to use simple textual descriptions of source code entities that developers can grasp easily, while capturing the code semantics precisely. We propose an approach to automatically determine such descriptions, based on automated text summarization technology.

show abstract

On the use of relevance feedback in IR-based concept location

2009

View full text Add to dashboard Cite

show abstract

On the relationship between bug reports and queries for text retrieval-based bug localization

et al. 2020

View full text Add to dashboard Cite

Are Bug Reports Enough for Text Retrieval-Based Bug Localization?

Mills

Pantiuchina

Parra

et al. 2018

View full text Add to dashboard Cite

Query-based configuration of text retrieval solutions for software engineering tasks

Moreno¹,

Bavota

Haiduc

et al. 2015

View full text Add to dashboard Cite

On the Use of Domain Terms in Source Code

Haiduc

Marcus

2008

View full text Add to dashboard Cite

Information about the problem domain of the software and the solution it implements is often embedded by developers in comments and identifiers. When using software developed by others or when are new to a project, programmers know little about how domain information is reflected in the source code. Programmers often learn about the domain from external sources such as books, articles, etc. Hence, it is important to use in comments and identifiers terms that are commonly known in the domain literature, as it is likely that programmers will use such terms when searching the source code. The paper presents a case study that investigated how domain terms are used in comments and identifiers. The study focused on three research questions: (1) to what degree are domain terms found in the source code of software from a particular problem domain?; (2) which is the preponderant source of domain terms: identifiers or comments?; and (3) to what degree are domain terms shared between several systems from the same problem domain? Within the studied software, we found that in average: 42% of the domain terms were used in the source code; 23% of the domain terms used in the source code are present in comments only, whereas only 11% in the identifiers alone, and there is a 63% agreement in the use of domain terms between any two software systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sonia Haiduc

On the Use of Automated Text Summarization Techniques for Summarizing Source Code

Automatic query reformulations for text retrieval in software engineering

Supporting program comprehension with source code summarization

On the use of relevance feedback in IR-based concept location

On the relationship between bug reports and queries for text retrieval-based bug localization

Are Bug Reports Enough for Text Retrieval-Based Bug Localization?

Query-based configuration of text retrieval solutions for software engineering tasks

On the Use of Domain Terms in Source Code

Contact Info

Product

Resources

About