Cross-Dataset Design Discussion Mining

Mahadi, Alvi; Tongay, Sefaattin; Ernst, Neil A.

doi:10.1109/saner48275.2020.9054792

Cited by 10 publications

(15 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous studies have introduced various vectorization techniques. In response to our previous study [15], we demonstrate how word embedding as a vectorization choice can improve the performance of the classifier. However, word embedding needs a reference model.…”

Section: How Useful Are Software-specific Word Vectorizers?mentioning

confidence: 68%

“…Early results from Robbes and Janes [23] reported on using ULMFiT [24] for sentiment analysis with some success. We also use the transfer NLP potential of ULMFiT, which we discuss in [15]. Robbes and Janes emphasized the importance of pretraining the learner on (potentially small) task-specific datasets.…”

Section: Cross-project Classifiers In Software Engineeringmentioning

confidence: 99%

See 1 more Smart Citation

Conclusion Stability for Natural Language Based Mining of Design Discussions

Mahadi¹,

Ernst²,

Tongay³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this paper we demonstrate a simple example of how design mining works. We then show how conclusion stability is poor on different artifact types and different projects. We show two techniques-augmentation and context specificity-that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC of 0.88 on within dataset classification and 0.80 on the cross-dataset classification task.

show abstract

Section: How Useful Are Software-specific Word Vectorizers?mentioning

confidence: 68%

Section: Cross-project Classifiers In Software Engineeringmentioning

confidence: 99%

Conclusion Stability for Natural Language Based Mining of Design Discussions

Mahadi¹,

Ernst²,

Tongay³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…As manual classification is not a practical option to classify 1, 661, 922 discussions, we use machine learning techniques. We followed the protocol of Brunet et al [19] with some improvisations suggested by Mahadi et al [45].…”

Section: Building the Discussion Classifiermentioning

confidence: 99%

“…Viviani et al [75] applied a classifier to automatically locate paragraphs in pull request discussions related to design. Mahadi et al [45] trained a classifier on the dataset created by Brunet et al [19] and tested it on the dataset of Viviani et al [75]. However, both of the dataset include discussions only from pull requests.…”

Section: Related Workmentioning

confidence: 99%

“…For example, Brunet et al [19] and Viviani et al [75,76] examined how design discussions are embedded in pull request comments and how it can be difficult for developers to piece together these discussions. Researchers have also developed techniques for detecting design discussions from a (single) communication channel using Machine Learning (ML) techniques [19,45,75,76].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the relationship between design discussions and design quality: a case study of Apache projects

Mannan

Ahmed

Jensen

et al. 2020

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

Conclusion stability for natural language based mining of design discussions

2021

Self Cite

View full text Add to dashboard Cite

Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this work we demonstrate a simple example of how design mining works. We first replicate an existing state-of-the-art design mining study to show how conclusion stability is poor on different artifact types and different projects. Then we introduce two techniques-augmentation and context specificity-that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC-ROC of 0.88 on within dataset classification and 0.84 on the cross-dataset classification task. iv Contents Supervisory Committee

show abstract

Cross-Dataset Design Discussion Mining

Cited by 10 publications

References 39 publications

Conclusion Stability for Natural Language Based Mining of Design Discussions

Conclusion Stability for Natural Language Based Mining of Design Discussions

On the relationship between design discussions and design quality: a case study of Apache projects

Conclusion stability for natural language based mining of design discussions

Contact Info

Product

Resources

About