Fabian Gilson scite author profile

Topic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.

show abstract

From User Stories to Use Case Scenarios towards a Generative Approach

Gilson

Irwin

2018

View full text Add to dashboard Cite

Generating Use Case Scenarios from User Stories

Gilson

Galster

2020

View full text Add to dashboard Cite

How junior developers deal with their technical debt?

Gilson

Morales-Trujillo

Mathews

2020

View full text Add to dashboard Cite

Technical debt is a metaphor that measures the additional effort needed to continue to add more features in a software due to its inherent decrease in code quality. Most software systems suffer from technical debt at some point so that dedicated tools and metrics have been developed to monitor such debt. Alongside tools, appropriate engineering practices must be put in place by the development team to keep that debt at an acceptable level. In this empirical study, we observed and surveyed Scrum development teams composed of experienced students in order to understand their quality-related processes on a year-long academic project. We found that (1) students do use static analysis tools of many forms, but their actual usage is limited due to time pressure; (2) retrospective and non-constraining feedback on code quality has little to no effect, even when given regularly during the course of the project; and (3) junior developers value composite quality indicators (e.g., maintainability, reliability in SonarQube), even if they do not fully understand their meaning. From our findings, we propose a series of recommendations, both technical and methodological, on how to train junior developers to understand and manage technical debt. CCS CONCEPTS• Software and its engineering → Agile software development; Maintaining software; • Social and professional topics → Software engineering education;

show abstract

What Quality Attributes Can We Find in Product Backlogs? A Machine Learning Perspective

Galster

Gilson

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fabian Gilson

Topic modeling in software engineering research

From User Stories to Use Case Scenarios towards a Generative Approach

Generating Use Case Scenarios from User Stories

How junior developers deal with their technical debt?

What Quality Attributes Can We Find in Product Backlogs? A Machine Learning Perspective

Contact Info

Product

Resources

About