2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER) 2020
DOI: 10.1109/saner48275.2020.9054792
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Dataset Design Discussion Mining

Abstract: Being able to identify software discussions that are primarily about design-which we call design mining-can improve documentation and maintenance of software systems. Existing design mining approaches have good classification performance using natural language processing (NLP) techniques, but the conclusion stability of these approaches is generally poor. A classifier trained on a given dataset of software projects has so far not worked well on different artifacts or different datasets. In this study, we repli… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 39 publications
1
12
0
Order By: Relevance
“…Previous studies have introduced various vectorization techniques. In response to our previous study [15], we demonstrate how word embedding as a vectorization choice can improve the performance of the classifier. However, word embedding needs a reference model.…”
Section: How Useful Are Software-specific Word Vectorizers?mentioning
confidence: 68%
See 1 more Smart Citation
“…Previous studies have introduced various vectorization techniques. In response to our previous study [15], we demonstrate how word embedding as a vectorization choice can improve the performance of the classifier. However, word embedding needs a reference model.…”
Section: How Useful Are Software-specific Word Vectorizers?mentioning
confidence: 68%
“…Early results from Robbes and Janes [23] reported on using ULMFiT [24] for sentiment analysis with some success. We also use the transfer NLP potential of ULMFiT, which we discuss in [15]. Robbes and Janes emphasized the importance of pretraining the learner on (potentially small) task-specific datasets.…”
Section: Cross-project Classifiers In Software Engineeringmentioning
confidence: 99%
“…As manual classification is not a practical option to classify 1, 661, 922 discussions, we use machine learning techniques. We followed the protocol of Brunet et al [19] with some improvisations suggested by Mahadi et al [45].…”
Section: Building the Discussion Classifiermentioning
confidence: 99%
“…Viviani et al [75] applied a classifier to automatically locate paragraphs in pull request discussions related to design. Mahadi et al [45] trained a classifier on the dataset created by Brunet et al [19] and tested it on the dataset of Viviani et al [75]. However, both of the dataset include discussions only from pull requests.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation