DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

Le, Triet Huynh Minh; Hin, David; Croft, Roland; Babar, Muhammad Ali

doi:10.1109/ase51524.2021.9678622

Cited by 27 publications

(15 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to the long-tailed distribution of CWE categories, we use three metrics, i.e., Macro F1, Weighted F1 and the multi-class version of Matthews Correlation Coefficient (MCC) [23], for evaluation. These metrics are also used by other vulnerability-related studies [28,38]. Macro F1 is the unweighted mean of the F1-scores of all categories, whereas Weighted F1 considers weighted mean.…”

Section: Methodsmentioning

confidence: 99%

“…It is crucial to detect, categorize and assess vulnerabilities. Due to the rapid increase in the number of software vulnerabilities and the success of deep learning techniques, researchers have proposed diverse deep-learning-based approaches to automate vulnerability analysis, such as vulnerability detection [14,68], classification [8,70], patch identification [66,69] and assessment [37,38], and achieved promising results.…”

Section: Introductionmentioning

confidence: 99%

“…Vulnerability assessment is a process that determines various characteristics of vulnerabilities and helps practitioners prioritize the remediation of critical vulnerabilities [37,38]. CVSS is a commonly used expert-based vulnerability assessment framework.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks

Liu,

Tang,

Zhang

et al. 2024

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

View full text Add to dashboard Cite

Vulnerability analysis is crucial for software security. Inspired by the success of pre-trained models on software engineering tasks, this work focuses on using pre-training techniques to enhance the understanding of vulnerable code and boost vulnerability analysis. The code understanding ability of a pre-trained model is highly related to its pre-training objectives. The semantic structure, e.g., control and data dependencies, of code is important for vulnerability analysis. However, existing pre-training objectives either ignore such structure or focus on learning to use it. The feasibility and benefits of learning the knowledge of analyzing semantic structure have not been investigated. To this end, this work proposes two novel pre-training objectives, namely Control Dependency Prediction (CDP) and Data Dependency Prediction (DDP), which aim to predict the statement-level control dependencies and token-level data dependencies, respectively, in a code snippet only based on its source code. During pre-training, CDP and DDP can guide the model to learn the knowledge required for analyzing fine-grained dependencies in code. After pre-training, the pre-trained model can boost the understanding of vulnerable code during fine-tuning and can directly be used to perform dependence analysis for both partial and complete functions. To demonstrate the benefits of our pre-training objectives, we pre-train a Transformer model named PDBERT with CDP and DDP, fine-tune it on three vulnerability analysis tasks, i.e., vulnerability detection, vulnerability classification, and vulnerability assessment, and also evaluate it on program

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks

Liu,

Tang,

Zhang

et al. 2024

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

View full text Add to dashboard Cite

show abstract

“…For this example, line 2-4, line 8 and line 10 are unrelated to the content and intent of this code change. As discussed in Section I, existing code change representation approaches either ignore the context [3], [8], [16], do not highlight the changed code [2], [13], [18], or consider all the context without adaptive information selection [14], [17]. These hinder their effectiveness and generality, and motivate us to propose the query-back mechanism to explicitly highlight the changed code and learn to adaptively capture information from the code change.…”

Section: Motivation Of Query-back Mechanismmentioning

confidence: 99%

“…However, many of them adopt task-specific architectures and are trained from scratch, which makes it non-trivial to adapt them to other tasks, especially the tasks with only small datasets. In addition, existing learning-based techniques either only focus on the changed code [3], [8], [16], separately encode the changed code and its context [14], [17], or encode the code change as a whole [2], [13], [18]. Some of them ignore the context or do not highlight the changed code.…”

Section: Introductionmentioning

confidence: 99%

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

Liu¹,

Tang²,

Xia³

et al. 2023

Preprint

View full text Add to dashboard Cite

Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-intime defect prediction. Intuitively, the quality of code change representations is crucial for the effectiveness of automated approaches. Prior work on code changes usually designs and evaluates code change representation approaches for a specific task, and little work has investigated code change encoders that can be used and jointly trained on various tasks. To fill this gap, this work proposes a novel Code Change Representation learning approach named CCRep, which can learn to encode code changes as feature vectors for diverse downstream tasks. Specifically, CCRep regards a code change as the combination of its before-change and after-change code, leverages a pretrained code model to obtain high-quality contextual embeddings of code, and uses a novel mechanism named query back to extract and encode the changed code fragments and make them explicitly interact with the whole code change. To evaluate CCRep and demonstrate its applicability to diverse code-change-related tasks, we apply it to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction. Experimental results show that CCRep outperforms the state-ofthe-art techniques on each task.

show abstract

An AST-Based Code Change Representation and Its Performance in Just-in-Time Vulnerability Prediction

Aladics

Hegedűs

Ferenć

2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

Cited by 27 publications

References 66 publications

Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks

Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

An AST-Based Code Change Representation and Its Performance in Just-in-Time Vulnerability Prediction

Contact Info

Product

Resources

About