Software vulnerability can cause disastrous consequences for information security. Earlier detection of vulnerabilities minimizes these consequences. Manual detection of vulnerable code is very difficult and very costly in terms of time and budget. Therefore, developers must use automatic vulnerabilities prediction (AVP) tools to minimize costs. Recent works on AVP begin to use techniques of deep learning (DL). All the proposed approaches are based on techniques of feature extraction inspired by previous applications of DL such as automatic language processing. Code metrics were widely used as features to build AVP models based on classic machine learning. This study bridges the gap between deep learning and machine learning features and discusses a deep-learning-based approach to finding vulnerabilities in code using code metrics. Obtained results show that code metrics are very good but not the better to use as features in DL-based AVP.INDEX TERMS Automatic vulnerability prediction, code metrics, deep neural networks.
Most security and privacy issues in software are related to exploiting code vulnerabilities. Many studies have tried to find the correlation between the software characteristics (complexity, coupling, etc.) quantified by corresponding code metrics and its vulnerabilities and to propose automatic prediction models that help developers locate vulnerable components to minimize maintenance costs. The results obtained by these studies cannot be applied directly to web applications because a web application differs in many ways from a non-web application: development, use, etc. and a lot of evaluation of these conclusions has to be made. The purpose of this study is to evaluate and compare the vulnerabilities prediction power of three types of code metrics in web applications. There are a few similar studies that targeted non-web application and to the best of our knowledge, there are no similar studies that targeted web applications. The results obtained show that unlike non-web applications where complexity metrics have better vulnerability prediction power, in web applications the metrics that give better prediction are the coupling metrics with high recall (> 75%) and fewer costs in terms of inspection (<25%).
Automatic vulnerabilities prediction assists developers and minimizes resources allocated to fix software security issues. These costs can be minimized even more if the exact location of vulnerability is correctly indicated. In this study, the authors propose a new approach to using code metrics in vulnerability detection. The strength part of the proposed approach lies in using code metrics not to simply quantify characteristics of software components at a coarse granularity (package, file, class, function) such as complexity, coupling, etc., which is the approach commonly used in previous studies, but to quantify extracted pieces of code that hint presence of vulnerabilities at a fine granularity (few lines of code). Obtained results show that code metrics can be used with a machine learning technique not only to indicate vulnerable components wish was the aim of previous approaches but also to detect and locate vulnerabilities with very good accuracy.
Locating vulnerable lines of code in large software systems needs huge efforts from human experts. This explains the high costs in terms of budget and time needed to correct vulnerabilities. To minimize these costs, automatic solutions of vulnerabilities prediction have been proposed. Existing machine learning (ML)-based solutions face difficulties in predicting vulnerabilities in coarse granularity and in defining suitable code features that limit their effectiveness. To addressee these limitations, in the present work, the authors propose an improved ML-based approach using slice-based code representation and the technique of TF-IDF to automatically extract effective features. The obtained results showed that combining these two techniques with ML techniques allows building effective vulnerability prediction models (VPMs) that locate vulnerabilities in a finer granularity and with excellent performances (high precision (>98%), low FNR (<2%) and low FPR (<3%) which outperforms software metrics and are equivalent to the best performing recent deep learning-based approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.