Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Chen, Zimin; Kommrusch, Steve; Monperrus, Martin

doi:10.1109/tse.2022.3147265

Cited by 54 publications

(27 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides, researchers use long short-term memory (LSTM) architecture to capture the long-distance dependencies among code sequences [20,107]. Recently, as a variant of the Seq2Seq model, Transformer [150] has been considered the state-of-the-art NMT repair architecture due to the self-attention mechanism [25,26,40].…”

Section: Neural Machine Translationmentioning

confidence: 99%

“…• In the data pre-processing phase, a given software buggy code snippet (e.g., buggy statement) is taken as the input and the processed code tokens are returned. According to existing learning-based APR studies [25,26], there generally exist three potential ways to pre-process the buggy code: code context, abstraction, and tokenization. First, code context information refers to other correlated non-buggy lines within the buggy program.…”

Section: Overall Workflowmentioning

confidence: 99%

“…In the data pre-processing phase, a given software buggy code snippet (e.g., a buggy function) is taken as the input and the processed code tokens are returned. According to existing learning-based repair studies [25,26], the data pre-processing phase generally consists of three parts: code abstraction, code context and code tokenization.…”

Section: Data Pre-processingmentioning

confidence: 99%

“…Instead of renaming rare identifiers through a custom abstraction process, SequenceR [24] utilizes the copy mechanism to generate candidate patches with a large set of tokens. Chen et al [25] adopt the raw source code as they think abstracted code may hide valuable information about the variable that can be learned by word embedding. A strategy similar to Chen et al [25] is also implemented in other learning-based APR techniques, such as in CODIT [20], CIRCLE [159] and TFix [13].…”

Section: Codementioning

confidence: 99%

“…Chen et al [25] adopt the raw source code as they think abstracted code may hide valuable information about the variable that can be learned by word embedding. A strategy similar to Chen et al [25] is also implemented in other learning-based APR techniques, such as in CODIT [20], CIRCLE [159] and TFix [13].…”

Section: Codementioning

confidence: 99%

See 4 more Smart Citations

A Survey of Learning-based Automated Program Repair

Zhang¹,

Fang²,

Ma³

et al. 2023

Preprint

View full text Add to dashboard Cite

Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive opensource code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance.In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at https://github.com/QuanjunZhang/AwesomeLearningAPR. CCS Concepts: • Software and its engineering → Software testing and debugging; Software testing and debugging.

show abstract