Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering 2017
DOI: 10.1145/3084226.3084273
|View full text |Cite
|
Sign up to set email alerts
|

On Using Active Learning and Self-training when Mining Performance Discussions on Stack Overflow

Abstract: Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…The obvious first step is to extend the dataset used for both the classification model and the CycleGAN. As the data labelling is labor-intensive, we plan to rely on our previous experience in active learning to focus annotation effort for maximum return on investment [4]. With more data, we can train the classifier to predict additional classes, including input related to emergency response.…”
Section: Discussionmentioning
confidence: 99%
“…The obvious first step is to extend the dataset used for both the classification model and the CycleGAN. As the data labelling is labor-intensive, we plan to rely on our previous experience in active learning to focus annotation effort for maximum return on investment [4]. With more data, we can train the classifier to predict additional classes, including input related to emergency response.…”
Section: Discussionmentioning
confidence: 99%
“…Active learning techniques select the most informative unlabeled examples to predict their label and include them in the training set [63]. Many researchers have successfully combined active learning with self-training to decrease the human labeling struggle and enhance the classification performance [33,64]. Motivated by the existing research [33], we integrate active learning with selftraining to select the most informative and high confident level examples.…”
Section: Active Self-training Based Sentiment Learnermentioning
confidence: 99%
“…In software engineering (SE), there are studies reporting low values for expert agreement/reliability using Kirppendorff's alpha and/or ICC, by e.g., Borg et al [4], Anvaari et al [1] and Kitchenham et al [27]. Evaluations depend on the interpretation of a construct under study, i.e., include some degree of subjectivity [5,47].…”
Section: Assessment Of Responsesmentioning
confidence: 99%
“…The values α ≥ 0.800 are suggested for drawing reliable conclusions while values 0.667 ≤ α < 0.800 are claimed for tentative conclusions only [29]. We used the R-function kripp.alpha 4 to measure the level of agreement among the respondents (raters) on the criteria (subjects) of the top 6 most evaluated tools. We considered the level of measurement for the data to be ratio, since the possible values (from 0 to 10 at intervals of 0.5, i.e., 21 levels) were ordered units having the same difference and an absolute zero.…”
Section: Krippendorff's Alphamentioning
confidence: 99%