Sergiu Zaharia scite author profile

Sergiu Zaharia

4Publications

2Citation Statements Received

0Citation Statements Given

How they've been cited

How they cite others

Affiliations

Polytechnic University of Bucharest

Publications

Order By: Most citations

Machine Learning-Based Security Pattern Recognition Techniques for Code Developers

2022

View full text Add to dashboard Cite

Software developers represent the bastion of application security against the overwhelming cyber-attacks which target all organizations and affect their resilience. As security weaknesses which may be introduced during the process of code writing are complex and matching different and variate skills, most applications are launched intrinsically vulnerable. We have advanced our research for a security scanner able to use automated learning techniques based on machine learning algorithms to recognize patterns of security weaknesses in source code. To make the scanner independent on the programming language, the source code is converted to a vectorial representation using natural language processing methods, which are able to retain semantical traits of the original code and at the same time to reduce the dependency on the lexical structure of the program. The security flaws detection performance is in the ranges accepted by software security professionals (recall > 0.94) even when vulnerable samples are very low represented in the dataset (e.g., less than 4% vulnerable code for a specific CWE in the dataset). No significant change or adaptation is needed to change the source code language under scrutiny. We apply this approach on detecting Common Weaknesses Enumeration (CWE) vulnerabilities in datasets provided by NIST (Test suites–NIST Software Assurance Reference Dataset).

show abstract

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

2023

View full text Add to dashboard Cite

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular “ancestors” from the programming languages’ evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection—in our case, without using any training data with C# source code.

show abstract

CWE Pattern Identification using Semantical Clustering of Programming Language Keywords

Zaharia

Rebedea

Trăușan-Matu

2021

View full text Add to dashboard Cite

Source Code Vulnerabilities Detection Using Loosely Coupled Data and Control Flows

Zaharia

Rebedea

Trăușan-Matu

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sergiu Zaharia

Machine Learning-Based Security Pattern Recognition Techniques for Code Developers

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

CWE Pattern Identification using Semantical Clustering of Programming Language Keywords

Source Code Vulnerabilities Detection Using Loosely Coupled Data and Control Flows

Contact Info

Product

Resources

About