2019
DOI: 10.1371/journal.pone.0225196
|View full text |Cite
|
Sign up to set email alerts
|

TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology

Abstract: With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated from source code or regex patterns based on expert experience. In this paper, TAP, which is based on token mechanism and deep learning technology, was proposed as an analysis model to discover the vulnerabilities of P… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 28 publications
(20 citation statements)
references
References 19 publications
0
19
0
Order By: Relevance
“…Minimizing generalization error Data augmentation techniques can help to increase the generalization performance and robustness of a trained model, feedforward as well as recurrent, by adding plausible deviations to the training data, e.g., changes to code samples (Cui et al (2018) [S13], Cui et al (2019) [S14]), or adding noise (Li et al 2019, [S12]). An alternative and complementary approach to improve generalization performance and to prevent over-fitting of a model is adding a dropout mechanism [S5], Fang et al (2019) [S16]). Models with Dropout randomly disable connections among neurons during training forcing the model to compensate for the missing connections and thereby becoming more robust.…”
Section: Representation Learning Vs Similarity-based Search a Majority Of 30 Primary Studiesmentioning
confidence: 99%
See 1 more Smart Citation
“…Minimizing generalization error Data augmentation techniques can help to increase the generalization performance and robustness of a trained model, feedforward as well as recurrent, by adding plausible deviations to the training data, e.g., changes to code samples (Cui et al (2018) [S13], Cui et al (2019) [S14]), or adding noise (Li et al 2019, [S12]). An alternative and complementary approach to improve generalization performance and to prevent over-fitting of a model is adding a dropout mechanism [S5], Fang et al (2019) [S16]). Models with Dropout randomly disable connections among neurons during training forcing the model to compensate for the missing connections and thereby becoming more robust.…”
Section: Representation Learning Vs Similarity-based Search a Majority Of 30 Primary Studiesmentioning
confidence: 99%
“…The CWE catalog currently lists 839 individual vulnerability types, while the most elaborate primary study merely aimed to distinguish 40 vulnerabilities. Primary studies acknowledge this shortcoming and plan to adopt their works to more types of vulnerabilities (Xu et al 2018 [S31], Wu et al 2017 [S28], Fang et al (2019) [S16]) or to introduce patterns uncovering multiple types of vulnerabilities (Zou et al 2019, [S6]). CWE vulnerabilities are partly defined in taxonomic relationship, i.e.…”
Section: And Future Directionsmentioning
confidence: 99%
“…The TAP tokenizer [18] proposed by Fang et al is an effective method to tokenize PHP source code. Based on the self-defined rules, they finished the audit for various vulnerabilities on the CWE dataset.…”
Section: Code Auditing With Machine Learningmentioning
confidence: 99%
“…This was the main reason for choosing open source tools, as they offer full access to the source code, which helps in understanding the evaluation results. However, some tools were discarded such as Ardilla [28] and IPAAS [29] since they do not provide source code yet and TAP [30] which is a recent tool to detect vulnerability using deep learning.…”
Section: Selected Toolsmentioning
confidence: 99%
“…The table is divided into three grouped sets or rows; the first column of the table shows the category name, the second column shows the subject name, each subject includes a set of test cases. The number of test cases in each subject is showed in the third column, the rest of columns display the results of true positives (TP tests), false positives (FP tests) and the success percentage of the tool in each subject [30]. The total percentage is calculated by dividing the total number of passed tests by the total number of tests in a given category.…”
Section: Vulnerability Detection In Inter-benchmark Testsmentioning
confidence: 99%