Abstract:Software Vulnerability Prediction (SVP) is a data-driven technique for software quality assurance that has recently gained considerable attention in the Software Engineering research community. However, the difficulties of preparing Software Vulnerability (SV) related data remains as the main barrier to industrial adoption. Despite this problem, there have been no systematic efforts to analyse the existing SV data preparation techniques and challenges. Without such insights, we are unable to overcome the chall… Show more
“…Mining software repositories has become a popular area of empirical software engineering research [36]. However, the use of these data sources does not come without perils; the data is not necessarily clean and can exhibit significant noise [37], [38]. Our study contributes to this body of knowledge by highlighting a unique data quality issue that is present in SV reporting data sources; SV severity ranking inconsistency.…”
Section: A Vulnerability Report Inconsistenciesmentioning
Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent analysis may provide conflicting severity rankings. Inconsistency across these data sources affects the reliability of severity assessment data, and can consequently impact SV prioritization and fixing. In this study, we investigate severity ranking inconsistencies over the SV reporting lifecycle. Our analysis helps characterize the nature of this problem, identify correlated factors, and determine the impacts of inconsistency on downstream tasks. Our findings observe that SV severity often lacks consideration or is underestimated during initial reporting, and such SVs consequently receive lower prioritization. We identify six potential attributes that are correlated to this misjudgment, and show that inconsistency in severity reporting schemes can severely degrade the performance of downstream severity prediction by up to 77%. Our findings help raise awareness of SV severity data inconsistencies and draw attention to this data quality problem. These insights can help developers better consider SV severity data sources, and improve the reliability of consequent SV prioritization. Furthermore, we encourage researchers to provide more attention to SV severity data selection.
“…Mining software repositories has become a popular area of empirical software engineering research [36]. However, the use of these data sources does not come without perils; the data is not necessarily clean and can exhibit significant noise [37], [38]. Our study contributes to this body of knowledge by highlighting a unique data quality issue that is present in SV reporting data sources; SV severity ranking inconsistency.…”
Section: A Vulnerability Report Inconsistenciesmentioning
Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent analysis may provide conflicting severity rankings. Inconsistency across these data sources affects the reliability of severity assessment data, and can consequently impact SV prioritization and fixing. In this study, we investigate severity ranking inconsistencies over the SV reporting lifecycle. Our analysis helps characterize the nature of this problem, identify correlated factors, and determine the impacts of inconsistency on downstream tasks. Our findings observe that SV severity often lacks consideration or is underestimated during initial reporting, and such SVs consequently receive lower prioritization. We identify six potential attributes that are correlated to this misjudgment, and show that inconsistency in severity reporting schemes can severely degrade the performance of downstream severity prediction by up to 77%. Our findings help raise awareness of SV severity data inconsistencies and draw attention to this data quality problem. These insights can help developers better consider SV severity data sources, and improve the reliability of consequent SV prioritization. Furthermore, we encourage researchers to provide more attention to SV severity data selection.
“…Thus, to achieve a better and more practical solution, our study extends this knowledge by investigating the feasibility of a variety of NLL techniques. Furthermore, we are the first to analyse noise tolerance for security defect datasets, which have been suggested to exhibit even greater data quality issues than regular defect datasets [10].…”
Section: Noise Tolerant Approaches For Defect Predictionmentioning
confidence: 99%
“…We firstly opted to conduct our classification at the file-level; we assigned each source code file a label as to whether it contains a reported vulnerability or not. The majority of SVP research has been conducted at the file-level [10], but recent state-of-the-art models have moved towards finer granularities [39]. However, we 2 https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html choose to retain our prediction at the file-level, as further localisation of vulnerable code within files would introduce additional noise and distrust in our positive labels.…”
“…Firstly, class imbalance is a significant issue prevalent in SV datasets [10], that negatively influences prediction capabilities of data-driven models. Although solutions to class imbalance exist, such as rebalancing and reweighting, both of which we have investigated and implemented, they are far from complete.…”
Section: Difficulties In Adoptionmentioning
confidence: 99%
“…Defect data preparation requires code modules to be labeled as clean or defective. To achieve this, researchers typically collect reported post-release software defects for identifying the faulty code modules [10]. The label correctness is inherently critical for training and evaluation of a prediction model [2], and mislabeled instances can heavily influence research outcomes [31,60].…”
Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It results in uncertainty, introduces labeling noise in the datasets and affects conclusion validity. To address this issue, we propose novel learning methods that are robust to label impurities and can leverage the most from limited label data; noisy label learning. We investigate various noisy label learning methods applied to software vulnerability prediction. Specifically, we propose a two-stage learning method based on noise cleaning to identify and remediate the noisy samples, which improves AUC and recall of baselines by up to 8.9% and 23.4%, respectively. Moreover, we discuss several hurdles in terms of achieving a performance upper bound with semi-omniscient knowledge of the label noise. Overall, the experimental results show that learning from noisy labels can be effective for data-driven software and security analytics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.