Bug triage processes are intended to assign bug reports to appropriate developers effectively, but they typically become bottlenecks in the development process-especially for large-scale software projects. Recently, several machine learning approaches, including deep learning-based approaches, have been proposed to recommend an appropriate developer automatically by learning past assignment patterns. In this paper, we propose a deep learning-based bug triage technique using a convolutional neural network (CNN) with three different word representation techniques: Word to Vector (Word2Vec), Global Vector (GloVe), and Embeddings from Language Models (ELMo). Experiments were performed on datasets from well-known large-scale open-source projects, such as Eclipse and Mozilla, and top-k accuracy was measured as an evaluation metric. The experimental results suggest that the ELMo-based CNN approach performs best for the bug triage problem. GloVe-based CNN slightly outperforms Word2Vec-based CNN in many cases. Word2Vec-based CNN outperforms GloVe-based CNN when the number of samples per class in the dataset is high enough.
As defects become more widespread in software development and advancement, bug triaging has become imperative for software testing and maintenance. The bug triage process assigns an appropriate developer to a bug report. Many automated and semiautomated systems have been proposed in the last decade, and some recent techniques have provided direction for developing an effective triage system. However, these techniques still require improvement. Another open challenge related to this problem is adding new developers to the existing triage system, which is challenging because the developers have no listed triage history. This paper proposes a transformer-based bug triage system that uses bidirectional encoder representation from transformers (BERT) for word representation. The proposed model can add a new developer to the existing system without building a training model from scratch. To add new developers, we assumed that new developers had a triage history created by a manual triager or human triage manager after learning their skills from the existing developer history. Then, the existing model was fine-tuned to add new developers using the manual triage history. Experiments were conducted using datasets from well-known large-scale open-source projects, such as Eclipse and Mozilla, and top-k accuracy was used as a criterion for assessment. The experimental outcome suggests that the proposed triage system is better than other word-embedding-based triage methods for the bug triage problem. Additionally, the proposed method performs the best for adding new developers to an existing bug triage system without requiring retraining using a whole dataset.
Many bugs and defects occur during software testing and maintenance. These bugs should be resolved as soon as possible, to improve software quality. However, bug triage aims to solve these bugs by assigning the reported bugs to an appropriate developer or list of developers. It is an arduous task for a human triager to assign an appropriate developer to a bug report, when there are several developers with different skills, and several automated and semi-automated triage systems have been proposed in the last decade. Some recent techniques have suggested possibilities for the development of an effective triage system. However, these techniques require improvement. In previous work, we proposed a heterogeneous graph representation for bug triage, using word-word edges and word-bug document co-occurrences to build a heterogeneous graph of bug data. Cosine similarity is used to weight the word-word edges. Then, a graph convolution network is used to learn a heterogeneous graph representation. This paper extends our previous work by adopting different similarity metrics and correlation metrics for weighting word-word edges. The method was validated using different small and large datasets obtained from large-scale opensource projects. The top-k accuracy metric was used to evaluate the performance of the bug triage system. The experimental results showed that the point-wise mutual information of the proposed model was better than that of other word-word weighting methods, and our method had better accuracy for large datasets than other recent state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.