A Practical Approach to the Automatic Classification of Security-Relevant Commits

Sabetta, Antonino; Bezzi, Michele

doi:10.1109/icsme.2018.00058

Cited by 59 publications

(62 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another example of a possible application of the dataset is presented in [8]. Motivated by the need to automate the Vulnerability Id.…”

Section: Applicationsmentioning

confidence: 99%

“…Commits CVE-2015-5348 23 CVE-2012-0022 15 CVE-2018-8009 13 CVE-2016-6801 13 CVE-2016-8749 12 CVE-2018-8027 12 CVE-2014-0119 11 CVE-2012-2098 10 CVE-2013- Commits 1 181 2 166 3 256 4 259 5 308 6 282 7 257 8 136 9 144 10 172 11 170 12 129 13 128 14 512 15 maintenance of the very vulnerability database from which our dataset is extracted, Sabetta and Bezzi [8] presented a novel approach to the automated classification of commits that are security-relevant (i.e., that are likely to fix a vulnerability). They used (an older, and smaller version of) the dataset presented here to train two independent classifiers, considering, respectively, the patch introduced by a commit (Patch Classifier) and the log messages (Message Classifier), without relying on information from vulnerability advisories.…”

Section: Applicationsmentioning

confidence: 99%

See 1 more Smart Citation

A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software

Ponta¹,

Plate²,

Sabetta³

et al. 2019

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Self Cite

View full text Add to dashboard Cite

Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool that we developed and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software and the commits fixing them. The data was obtained both from the National Vulnerability Database (NVD) and from project-specific Web resources that we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct open-source Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE identifier at all and 46, which do have a CVE identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories and to augment the attributes available for each instance. Also, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; also, it represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.

show abstract

“…Another example of a possible application of the dataset is presented in [8]. Motivated by the need to automate the Vulnerability Id.…”

Section: Applicationsmentioning

confidence: 99%

Section: Applicationsmentioning

confidence: 99%

A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software

Ponta¹,

Plate²,

Sabetta³

et al. 2019

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Self Cite

View full text Add to dashboard Cite

show abstract

“…al. [7], a method that uses machine-learning to investigate ASCII text file repositories and to mechanically determine commits that area unit security-relevant (i.e., that area unit probably to mend vulnerability). They treat the ASCII text file changes introduced by commits as documents written in language, classifying them victimization commonplace document classification strategies.…”

Section: Related Workmentioning

confidence: 99%

Software Vulnerability Classification Based On Deep Neural Network

Vitthalrao*,

Gupta

2019

IJEAT

View full text Add to dashboard Cite

Software vulnerability is most common issues in software engineering, many applications has suffering vulnerability, information leakage, and data hijacking such kind of problems facing since couple of years. Sometimes developers should be making some mistakes during code making which generate vulnerability issues for entire application. In this research work, we carried out an approach to software vulnerability detection using deep learning approach behalf of metadata processing. The system carried software vulnerability detection based on the Deep Neural Network (DNN). a new dynamic vulnerability classification approach has suggested. The model basic build based on TF-IDF as well density based feature selection approach for DNN. basically TF-IDF has used to measured the frequency and weight of specific word of vulnerability description; the Vector Space Model (VSM) is used for feature selection to achieve an finest set of feature term, and; the DNN neural network model is used to built an dynamic weakness classifier to achieve effectiveness into the bug detection. The overall system has categorized into four phases in first phase we detect the code clone to eliminate the data redundancy and execution time complexity, in second we apply Vector Space Model (VSM) recommend the re-factor possibility in entire code while in third section we build DNN module for software vulnerability detection and finally recommend the vulnerability for entire code. The system partial implementation has evaluated in java environment which provide satisfactory results for heterogeneous code modules .

show abstract

“…Prior work on vulnerable training sample collection [18,19,20] is characterized by a one-size-fits-all assumption. They use a single monolithic model for locating vulnerability-relevant commits.…”

Section: Introductionmentioning

confidence: 99%

“…We thoroughly evaluate FUNDED on large real-life datasets of code commit history and vulnerable programs written in C, Java, Php and Swift. We compare FUNDED against six state-of-the-art (SOTA) learning-based detection methods for software bugs or vulnerabilities [4,5,16,6,3,15], and five SOTA methods for automatic vulnerable code sample collection [18,19,23,20,24]. Experimental results show that FUNDED consistently outperforms competing methods across evaluation settings, by discovering more code vulnerabilities with a lower false-positive rate.…”

Section: Introductionmentioning

confidence: 99%