2022
DOI: 10.48550/arxiv.2203.04468
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Noisy Label Learning for Security Defects

Abstract: Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that it is infeasible to obtain a noise-free security defect dataset in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It results in uncertainty, introduces labeling noise in the datasets and affects conclusion validity. To address this… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(7 citation statements)
references
References 53 publications
0
7
0
Order By: Relevance
“…In reality, complete knowledge of the latent vulnerabilities is unobtainable. Croft et al [13] observed at least twice as many latent vulnerabilities as known vulnerabilities in their dataset.…”
Section: Consistencymentioning
confidence: 93%
See 4 more Smart Citations
“…In reality, complete knowledge of the latent vulnerabilities is unobtainable. Croft et al [13] observed at least twice as many latent vulnerabilities as known vulnerabilities in their dataset.…”
Section: Consistencymentioning
confidence: 93%
“…CodeBERT is a pre-trained state-of-the-art code embedding model based on the RoBERTa architecture [52]. Similar studies have demonstrated the effectiveness of Code-BERT for SVP [8], [13]. LineVul generates function-level predictions using a transformer-based architecture.…”
Section: Validating Attribute Impactmentioning
confidence: 99%
See 3 more Smart Citations