2014
DOI: 10.1007/978-3-319-09940-8_19
|View full text |Cite
|
Sign up to set email alerts
|

Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases

Abstract: Abstract. We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
2
1

Relationship

4
5

Authors

Journals

citations
Cited by 34 publications
(33 citation statements)
references
References 17 publications
(24 reference statements)
0
25
0
Order By: Relevance
“…For every considered release, we iteratively train on the previous release(s) and evaluate on the current one. We consider two typical cases addressed in previous work: training on the last release [18,20] and training on the last three releases [37,38]. We start the evaluation from the fourth release onwards (as we need at least three releases on which to train the predictive models) and we consider releases with at least 10 vulnerable components.…”
Section: Experimental Design and Analysis 61 Methodologymentioning
confidence: 99%
“…For every considered release, we iteratively train on the previous release(s) and evaluate on the current one. We consider two typical cases addressed in previous work: training on the last release [18,20] and training on the last three releases [37,38]. We start the evaluation from the fourth release onwards (as we need at least three releases on which to train the predictive models) and we consider releases with at least 10 vulnerable components.…”
Section: Experimental Design and Analysis 61 Methodologymentioning
confidence: 99%
“…However, the limitations in the study reduce the capacity of observation, due to i) the tests to evaluate a change were selected according to some structural criterion (coverage, for example), assessing the changes from a different perspective from the original; (ii) the experiment optimized the functions separately, observing improvements only in this isolated context; and (iii) there was no update of these functions in the original software for evaluation and comparison with all of them updated. Harman et al (2014) applied GI in the migration and transplantation of functionalities between software systems in operation. The researchers experimented with using an instant messaging system (Pidgin), and another one of text translation (Babel Fish).…”
Section: Related Workmentioning
confidence: 99%
“…Among them, machine learners and regression algorithms such as Decision Trees, Logistic Regression and Naïve Bayes are widely used [12,36,22]. Recently, also Search-Based approaches have been successfully exploited (e.g., [1,11,25,45,68]). However, according to recent systematic literature reviews [22,66], the choice of a modelling technique seems to have less impact on the classification accuracy of a model than the choice of a metrics set.…”
Section: Software Fault Predictionmentioning
confidence: 99%
“…Moreover our analysis was performed on data belonging to the same software version, thus it is possible that these results might be valid only for the current version. To mitigate this threat we plan to investigate in our future work mutation based metrics both for next-releases [25] and cross-project fault predictions [48,63].…”
Section: Threats To Validitymentioning
confidence: 99%