The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021
DOI: 10.48550/arxiv.2109.05740
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data Preparation for Software Vulnerability Prediction: A Systematic Literature Review

Abstract: Software Vulnerability Prediction (SVP) is a data-driven technique for software quality assurance that has recently gained considerable attention in the Software Engineering research community. However, the difficulties of preparing Software Vulnerability (SV) related data remains as the main barrier to industrial adoption. Despite this problem, there have been no systematic efforts to analyse the existing SV data preparation techniques and challenges. Without such insights, we are unable to overcome the chall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 54 publications
0
8
0
Order By: Relevance
“…Mining software repositories has become a popular area of empirical software engineering research [36]. However, the use of these data sources does not come without perils; the data is not necessarily clean and can exhibit significant noise [37], [38]. Our study contributes to this body of knowledge by highlighting a unique data quality issue that is present in SV reporting data sources; SV severity ranking inconsistency.…”
Section: A Vulnerability Report Inconsistenciesmentioning
confidence: 87%
“…Mining software repositories has become a popular area of empirical software engineering research [36]. However, the use of these data sources does not come without perils; the data is not necessarily clean and can exhibit significant noise [37], [38]. Our study contributes to this body of knowledge by highlighting a unique data quality issue that is present in SV reporting data sources; SV severity ranking inconsistency.…”
Section: A Vulnerability Report Inconsistenciesmentioning
confidence: 87%
“…Thus, to achieve a better and more practical solution, our study extends this knowledge by investigating the feasibility of a variety of NLL techniques. Furthermore, we are the first to analyse noise tolerance for security defect datasets, which have been suggested to exhibit even greater data quality issues than regular defect datasets [10].…”
Section: Noise Tolerant Approaches For Defect Predictionmentioning
confidence: 99%
“…We firstly opted to conduct our classification at the file-level; we assigned each source code file a label as to whether it contains a reported vulnerability or not. The majority of SVP research has been conducted at the file-level [10], but recent state-of-the-art models have moved towards finer granularities [39]. However, we 2 https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html choose to retain our prediction at the file-level, as further localisation of vulnerable code within files would introduce additional noise and distrust in our positive labels.…”
Section: Software Vulnerability Predictionmentioning
confidence: 99%
See 2 more Smart Citations