Vulnerable Code Detection Using Software Metrics and Machine Learning

Medeiros, Nádia; Ivaki, Naghmeh; Costa, Pedro; Vieira, Marco

doi:10.1109/access.2020.3041181

Cited by 26 publications

(10 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The traditional vulnerability detection approach based on deep learning involves detecting differences in expert metrics [23][24][25]. However, code metrics may sometimes be identical, leading to misjudgment.…”

Section: Graph-based Vulnerability Detectionmentioning

confidence: 99%

Guard2Vul: Vulnerability Detection viaGradient-based Adversarial TrainingEnhanced Graph Learning

Shen,

Ju,

et al. 2024

Preprint

View full text Add to dashboard Cite

Modern software security is challenged by vulnerabilities, which canlead to severe consequences, including loss of information, property, andprivacy disclosure. Recently, graph-based deep learning methods havebeen proven as a promising solution for vulnerability detection at func-tional granularity. However, previous studies still face challenges, suchas structural imbalances, complex connections in abstract syntax treesof functions, and insufficient training data. In this study, we proposeGuard2Vul, which combines a graph neural network with a residualnetwork to jointly capture deeper semantic and structural features,specifically, source code and node dependencies. Moreover, we utilizeGraphSMOTE and DGD, a dropout-enhanced gradient adversarial train-ing technique, to conduct data augmentation and automatically improvenormalized stability. To demonstrate the effectiveness of Guard2Vul, we evaluate its performance on four experimental subjects by different pro-gramming languages, such as C/C++ and Java. To show the competitive-ness of Guard2Vul, we consider five Deep Learning-based vulnerabilitydetection approaches (i.e., TokenCNN, Sysevr, VulDeePecker, Devign,and Reveal) as baselines. The results indicate that Guard2Vul out-performs these baselines, achieving at least 13.4%, 28.9%, 6.2%, and12.1% higher F1-measure on four experimental subjects. Finally, weperform ablation experiments to demonstrate the effectiveness of ourcustomized components, namely enhanced graph representation learningand the gradient-based adversarial training method, in Guard2Vul.

show abstract

Section: Graph-based Vulnerability Detectionmentioning

confidence: 99%

Guard2Vul: Vulnerability Detection viaGradient-based Adversarial TrainingEnhanced Graph Learning

Shen,

Ju,

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…It was also observed that Artificial Neural Network (ANN) is the most widely applied algorithm and Linear Regression being the least preferred technique. In [16], the author talks about applying machine learning technique to the implementation phase. He suggests how to detect vulnerable or faulty code in the software.…”

Section: Related Workmentioning

confidence: 99%

Framework for the Automation of SDLC Phases using Artificial Intelligence and Machine Learning Techniques

Shankar,

Chaudhari

2023

IJRITCC

View full text Add to dashboard Cite

Software Engineering acts as a foundation stone for any software that is being built. It provides a common road-map for construction of software from any domain. Not following a well-defined Software Development Model have led to the failure of many software projects in the past. Agile is the Software Development Life Cycle (SDLC) Model that is widely used in practice in the IT industries to develop software on various technologies such as Big Data, Machine Learning, Artificial Intelligence, Deep learning. The focus on Software Engineering side in the recent years has been on trying to automate the various phases of SDLC namely- Requirements Analysis, Design, Coding, Testing and Operations and Maintenance. Incorporating latest trending technologies such as Machine Learning and Artificial Intelligence into various phases of SDLC, could facilitate for better execution of each of these phases. This in turn helps to cut-down costs, save time, improve the efficiency and reduce the manual effort required for each of these phases. The aim of this paper is to present a framework for the application of various Artificial Intelligence and Machine Learning techniques in the different phases of SDLC.

show abstract

“…To evaluate our criterion under realistic conditions, we selected for each subject program the model plus cutoff value that found at least 90% of the bug-triggering and thus vulnerable functions, while also flagging the fewest functions reachable 7 from the fuzz entry. We chose this detection rate threshold because it is a realistic value, as shown by our evaluation (see Section 5.1) and related studies [40,54,55].…”

Section: Selected Models and Cutoffsmentioning

confidence: 99%

“…Most of the existing studies on ML-based vulnerability prediction consider one specific class of features instead of several. Such studies usually adopt software metrics (such as complexity, code churn, and developer activity) [34,54,55,59,73], text mining [21,73,77], or static code analysis (e.g., data/control-flow and/or taint checking) [40,62,68]. Interestingly, some of these studies have also found that the comparably simple features can compete with and sometimes even outperform the more complex ones.…”

Section: Related Workmentioning

confidence: 99%

Green Fuzzing: A Saturation-Based Stopping Criterion using Vulnerability Prediction

Lipp¹,

Elsner²,

Kacianka³

et al. 2023

Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

View full text Add to dashboard Cite

Fuzzing is a widely used automated testing technique that uses random inputs to provoke program crashes indicating security breaches. A difficult but important question is when to stop a fuzzing campaign. Usually, a campaign is terminated when the number of crashes and/or covered code elements has not increased over a certain period of time. To avoid premature termination when a ramp-up time is needed before vulnerabilities are reached, code coverage is often preferred over crash count to decide when to terminate a campaign. However, a campaign might only increase the coverage on non-security-critical code or repeatedly trigger the same crashes. For these reasons, both code coverage and crash count tend to overestimate the fuzzing effectiveness, unnecessarily increasing the duration and thus the cost of the testing process.The present paper explores the tradeoff between the amount of saved fuzzing time and number of missed bugs when stopping campaigns based on the saturation of covered, potentially vulnerable functions rather than triggered crashes or regular function coverage. In a large-scale empirical evaluation of 30 open-source C programs with a total of 240 security bugs and 1,280 fuzzing campaigns, we first show that binary classification models trained on software with known vulnerabilities (CVEs), using lightweight machine learning features derived from findings of static application security testing tools and proven software metrics, can reliably predict (potentially) vulnerable functions. Second, we show that our proposed stopping criterion terminates 24-hour fuzzing campaigns 6-12 hours earlier than the saturation of crashes and regular function coverage while missing (on average) fewer than 0.5 out of 12.5 contained bugs. CCS CONCEPTS• Security and privacy → Software security engineering.

show abstract

Vulnerable Code Detection Using Software Metrics and Machine Learning

Cited by 26 publications

References 62 publications

Guard2Vul: Vulnerability Detection viaGradient-based Adversarial TrainingEnhanced Graph Learning

Guard2Vul: Vulnerability Detection viaGradient-based Adversarial TrainingEnhanced Graph Learning

Framework for the Automation of SDLC Phases using Artificial Intelligence and Machine Learning Techniques

Green Fuzzing: A Saturation-Based Stopping Criterion using Vulnerability Prediction

Contact Info

Product

Resources

About