Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings

Alikhashashneh, Enas A.; Raje, Rajeev R.; Hill, James H.

doi:10.1109/aiccsa.2018.8612819

Cited by 13 publications

(9 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Benchmarks: We used C projects in CoreBench [14] (we used the first listed buggy version of all 4 projects) and BugBench [48] (it released 11 programs and 8 are C programs which we used). The 12 projects consist of 2.9 million lines of code (sloc) 7 , shown in the first two columns of Table 1. From the Commercial Tool and PolySpace, we processed a total of 1955 warnings of 41 categories.…”

Section: Static Analysis Toolsmentioning

confidence: 99%

“…There are also approaches that identify patterns from warnings, source code and software repositories for predicting false positives [7,9,17,17,40,45,59,71,74], and that use machine learning techniques to learn what are likely true and false positives [7, 23, 39, 45, 59, 70? ]. For example, Zhang et al automatically learned and integrated the users' feedback to rank the warnings [76].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Validating static warnings via testing code fragments

Joshy

Chen

Steenhoek

et al. 2021

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

View full text Add to dashboard Cite

Static analysis is an important approach for finding bugs and vulnerabilities in software. However, inspecting and confirming static warnings are challenging and time-consuming. In this paper, we present a novel solution that automatically generates test cases based on static warnings to validate true and false positives. We designed a syntactic patching algorithm that can generate syntactically valid, semantic preserving executable code fragments from static warnings. We developed a build and testing system to automatically test code fragments using fuzzers, KLEE and Valgrind. We evaluated our techniques using 12 real-world C projects and 1955 warnings from two commercial static analysis tools. We successfully built 68.5% code fragments and generated 1003 test cases. Through automatic testing, we identified 48 true positives and 27 false positives, and 205 likely false positives. We matched 4 CVE and real-world bugs using Helium, and they are only triggered by our tool but not other baseline tools. We found that testing code fragments is scalable and useful; it can trigger bugs that testing entire programs or testing procedures failed to trigger. CCS CONCEPTS• Software and its engineering → Software testing and debugging; Software verification and validation; Software defect analysis.

show abstract

Section: Static Analysis Toolsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Validating static warnings via testing code fragments

Joshy

Chen

Steenhoek

et al. 2021

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

View full text Add to dashboard Cite

show abstract

“…Then, they tagged the samples. Similarly, Alikhashashneh et al [11] used the Understand tool to detect various metrics, and employed them on the Juliet test suite for C++.…”

Section: Quality Assessment/predictionmentioning

confidence: 99%

“…Ribeiro et al [259] generated features only from the warnings (such as redundancy level and number of warnings in the same file). Some studies [11,158] used source code metrics as features.…”

Section: Quality Assessment/predictionmentioning

confidence: 99%

A Survey on Machine Learning Techniques for Source Code Analysis

Sharma¹,

Kechagia²,

Georgiou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Context:The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and vulnerabilities detection. A large number of studies poses challenges to the community to understand the current landscape. Objective: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. Method: We investigate studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021. We summarize our observations and findings with the help of the identified studies. Results: Our findings suggest that the usage of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task, and summarize the employed machine learning techniques. Additionally, we collate a comprehensive list of available datasets and tools useable in this context. Finally, we summarize the perceived challenges in this area that include availability of standard datasets, reproducibility and replicability, and hardware resources. CCS Concepts: • Software and its engineering → Software libraries and repositories; Software maintenance tools; Software post-development issues; Maintaining software; • Computing methodologies → Machine learning.

show abstract

“…They proposed to selectively learn their SVM model by only harmless codeset structures used to predict only FPAs. In [21] and [22] authors uses ML techniques to reduce false Authors in [23] use lexical tokenization labeled by the human to learn their CNN classifier to reduce false alerts. They propose a continuous mechanism for code integration after review.…”

Section: A Machine Learning-based Approachesmentioning

confidence: 99%

Software Security Static Analysis False Alerts Handling Approaches

Akremi¹

2021

IJACSA

View full text Add to dashboard Cite

False Positive Alerts (FPA), generated by Static Analyzers Tools (SAT), reduce the effectiveness of the automatic code review, letting them be underused in practice. Researchers conduct a lot of tests to improve SAT accuracy while keeping FPA at a lower rate. They use different simulated and production datasets to validate their proposed methods. This paper surveys recent approaches dealing with FPA filtering; it compares them and discusses their usefulness. It also studies the used datasets to validate the identified methods and show their effectiveness to cover most program defects. This study focuses mainly on the security bugs covered by the datasets and handled by the existing methods.

show abstract

Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings

Cited by 13 publications

References 14 publications

Validating static warnings via testing code fragments

Validating static warnings via testing code fragments

A Survey on Machine Learning Techniques for Source Code Analysis

Software Security Static Analysis False Alerts Handling Approaches

Contact Info

Product

Resources

About