2023
DOI: 10.1109/tse.2022.3147265
|View full text |Cite
|
Sign up to set email alerts
|

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Abstract: In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problem with data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousand examples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage the intuition that the bug fixing task and the vulnerability fixing task are related and that the knowledge learned from bug fi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 54 publications
(27 citation statements)
references
References 70 publications
0
16
0
Order By: Relevance
“…Besides, researchers use long short-term memory (LSTM) architecture to capture the long-distance dependencies among code sequences [20,107]. Recently, as a variant of the Seq2Seq model, Transformer [150] has been considered the state-of-the-art NMT repair architecture due to the self-attention mechanism [25,26,40].…”
Section: Neural Machine Translationmentioning
confidence: 99%
See 4 more Smart Citations
“…Besides, researchers use long short-term memory (LSTM) architecture to capture the long-distance dependencies among code sequences [20,107]. Recently, as a variant of the Seq2Seq model, Transformer [150] has been considered the state-of-the-art NMT repair architecture due to the self-attention mechanism [25,26,40].…”
Section: Neural Machine Translationmentioning
confidence: 99%
“…• In the data pre-processing phase, a given software buggy code snippet (e.g., buggy statement) is taken as the input and the processed code tokens are returned. According to existing learning-based APR studies [25,26], there generally exist three potential ways to pre-process the buggy code: code context, abstraction, and tokenization. First, code context information refers to other correlated non-buggy lines within the buggy program.…”
Section: Overall Workflowmentioning
confidence: 99%
See 3 more Smart Citations