Fayola Peters scite author profile

Abstract-How can we find data for quality prediction? Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. But is this the best approach?This paper introduces the Peters filter which is based on the following conjecture: When local data is scarce, more information exists in other projects. Accordingly, this filter selects training data via the structure of other projects.To assess the performance of the Peters filter, we compare it with two other approaches for quality prediction. Withincompany learning and cross-company learning with the Burak filter (the state-of-the-art relevancy filter). This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within-company and the Burak filter+cross-company; and 3) the Peters filter builds 64% more useful predictors than both withincompany and the Burak filter+cross-company approaches. Hence, we recommend the Peters filter for cross-company learning.

show abstract

Text Filtering and Ranking for Security Bug Report Prediction

Peters

Tun

et al. 2019

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Abstract-Security bug reports can describe security critical vulnerabilities in software products. Bug tracking systems may contain thousands of bug reports, where relatively few of them are security related. Therefore finding unlabelled security bugs among them can be challenging. To help security engineers identify these reports quickly and accurately, text-based prediction models have been proposed. These can often mislabel security bug reports due to a number of reasons such as class imbalance, where the ratio of non-security to security bug reports is very high. More critically, we have observed that the presence of security related keywords in both security and non-security bug reports can lead to the mislabelling of security bug reports. This paper proposes FARSEC, a framework for filtering and ranking bug reports for reducing the presence of security related keywords. Before building prediction models, our framework identifies and removes non-security bug reports with security related keywords. We demonstrate that FARSEC improves the performance of text-based prediction models for security bug reports in 90% of cases. Specifically, we evaluate it with 45,940 bug reports from Chromium and four Apache projects. With our framework, we mitigate the class imbalance issue and reduce the number of mislabelled security bug reports by 38%.

show abstract

Learning from Open-Source Projects: An Empirical Study on Defect Prediction

Peters

Menzies

et al. 2013

View full text Add to dashboard Cite

Balancing Privacy and Utility in Cross-Company Defect Prediction

Peters

Gong

Zhang

2013

IIEEE Trans. Software Eng.

124

View full text Add to dashboard Cite

Abstract-Background: Cross-company defect prediction (CCDP) is a field of study where an organization lacking enough local data can use data from other organizations for building defect predictors. To support CCDP, data must be shared. Such shared data must be privatized, but that privatization could severely damage the utility of the data. Aim: To enable effective defect prediction from shared data while preserving privacy. Method: We explore privatization algorithms that maintain class boundaries in a dataset. CLIFF is an instance pruner that deletes irrelevant examples. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. CLIFF+MORPH are tested in a CCDP study among 10 defect datasets from the PROMISE data repository. Results: We find: 1) The CLIFFed+MORPHed algorithms provide more privacy than the state-of-the-art privacy algorithms; 2) in terms of utility measured by defect prediction, we find that CLIFF+MORPH performs significantly better. Conclusions: For the OO defect data studied here, data can be privatized and shared without a significant degradation in utility. To the best of our knowledge, this is the first published result where privatization does not compromise defect prediction.

show abstract

LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction

Peters

Menzies

Layman

2015

View full text Add to dashboard Cite

The Art and Science of Analyzing Software Data; Quantitative Methods

Minku

Peters²

2015

View full text Add to dashboard Cite

Data science for software engineering

Kocagüneli

Peters

Turhan

et al. 2013

View full text Add to dashboard Cite

Using Goals in Model-Based Reasoning

Kocagüneli

Minku

Peters

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fayola Peters

Better cross company defect prediction

Text Filtering and Ranking for Security Bug Report Prediction

Learning from Open-Source Projects: An Empirical Study on Defect Prediction

Balancing Privacy and Utility in Cross-Company Defect Prediction

LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction

The Art and Science of Analyzing Software Data; Quantitative Methods

Data science for software engineering

Using Goals in Model-Based Reasoning

Contact Info

Product

Resources

About