2019
DOI: 10.1007/s42452-019-1419-y
|View full text |Cite
|
Sign up to set email alerts
|

New labeled dataset of interconnected lexical typos for automatic correction in the bug reports

Abstract: Large-scale and especially open-source projects use software triage systems like Bugzilla to manage their user's requests like bugs, suggestions, and requirements. The software triage systems have many tasks like prioritizing, finding duplicate and assigning bug reports to developers automatically, which needs text mining, information retrieval, and natural language processing techniques. We already showed there are many typos in the bug reports which reduce the performance of artificial intelligence technique… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…The first methodology called the information retrieval-based approach, which its procedure is shown in Figure 1. In the first box, the raw dataset of bug reports exists which should be preprocessed in box 2 till deal with null values, unify the data type of some fields like version and priority and preferably change them to numerical, remove stop words from textual fields, stemming textual fields, correcting the typos in textual DFs [5,8] [9], and so on [1,4]. The feature extraction phase of box 6 returns a numerical vector consist of many similarity metrics as box 7.…”
Section: Information Retrieval (Ir)-based Methodology Of Automatic Duplicate Bug Report Detection (Adbrd)mentioning
confidence: 99%
“…The first methodology called the information retrieval-based approach, which its procedure is shown in Figure 1. In the first box, the raw dataset of bug reports exists which should be preprocessed in box 2 till deal with null values, unify the data type of some fields like version and priority and preferably change them to numerical, remove stop words from textual fields, stemming textual fields, correcting the typos in textual DFs [5,8] [9], and so on [1,4]. The feature extraction phase of box 6 returns a numerical vector consist of many similarity metrics as box 7.…”
Section: Information Retrieval (Ir)-based Methodology Of Automatic Duplicate Bug Report Detection (Adbrd)mentioning
confidence: 99%