TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology

Fang, Yong; Han, Shengjun; Huang, Cheng; Wu, Runpu

doi:10.1371/journal.pone.0225196

Cited by 28 publications

(20 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Minimizing generalization error Data augmentation techniques can help to increase the generalization performance and robustness of a trained model, feedforward as well as recurrent, by adding plausible deviations to the training data, e.g., changes to code samples (Cui et al (2018) [S13], Cui et al (2019) [S14]), or adding noise (Li et al 2019, [S12]). An alternative and complementary approach to improve generalization performance and to prevent over-fitting of a model is adding a dropout mechanism [S5], Fang et al (2019) [S16]). Models with Dropout randomly disable connections among neurons during training forcing the model to compensate for the missing connections and thereby becoming more robust.…”

Section: Representation Learning Vs Similarity-based Search a Majority Of 30 Primary Studiesmentioning

confidence: 99%

“…The CWE catalog currently lists 839 individual vulnerability types, while the most elaborate primary study merely aimed to distinguish 40 vulnerabilities. Primary studies acknowledge this shortcoming and plan to adopt their works to more types of vulnerabilities (Xu et al 2018 [S31], Wu et al 2017 [S28], Fang et al (2019) [S16]) or to introduce patterns uncovering multiple types of vulnerabilities (Zou et al 2019, [S6]). CWE vulnerabilities are partly defined in taxonomic relationship, i.e.…”

Section: And Future Directionsmentioning

confidence: 99%

See 1 more Smart Citation

Deep security analysis of program code

2021

View full text Add to dashboard Cite

Due to the continuous digitalization of our society, distributed and web-based applications become omnipresent and making them more secure gains paramount relevance. Deep learning (DL) and its representation learning approach are increasingly been proposed for program code analysis potentially providing a powerful means in making software systems less vulnerable. This systematic literature review (SLR) is aiming for a thorough analysis and comparison of 32 primary studies on DL-based vulnerability analysis of program code. We found a rich variety of proposed analysis approaches, code embeddings and network topologies. We discuss these techniques and alternatives in detail. By compiling commonalities and differences in the approaches, we identify the current state of research in this area and discuss future directions. We also provide an overview of publicly available datasets in order to foster a stronger benchmarking of approaches. This SLR provides an overview and starting point for researchers interested in deep vulnerability analysis on program code.

show abstract

Section: Representation Learning Vs Similarity-based Search a Majority Of 30 Primary Studiesmentioning

confidence: 99%

Section: And Future Directionsmentioning

confidence: 99%

Deep security analysis of program code

2021

View full text Add to dashboard Cite

show abstract

“…The TAP tokenizer [18] proposed by Fang et al is an effective method to tokenize PHP source code. Based on the self-defined rules, they finished the audit for various vulnerabilities on the CWE dataset.…”

Section: Code Auditing With Machine Learningmentioning

confidence: 99%

Cross-Site Scripting Guardian: A Static XSS Detector Based on Data Stream Input-Output Association Mining

Wang

Miao

et al. 2020

Applied Sciences

Self Cite

View full text Add to dashboard Cite

The largest number of cybersecurity attacks is on web applications, in which Cross-Site Scripting (XSS) is the most popular way. The code audit is the main method to avoid the damage of XSS at the source code level. However, there are numerous limits implementing manual audits and rule-based audit tools. In the age of big data, it is a new research field to assist the manual auditing through machine learning. In this paper, we propose a new way to audit the XSS vulnerability in PHP source code snippets based on a PHP code parsing tool and the machine learning algorithm. We analyzed the operation sequence of source code and built a model to acquire the information that is most closely related to the XSS attack in the data stream. The method proposed can significantly improve the recall rate of vulnerability samples. Compared with related audit methods, our method has high reusability and excellent performance. Our classification model achieved an F1 score of 0.92, a recall rate of 0.98 (vulnerable sample), and an area under curve (AUC) of 0.97 on the test dataset.

show abstract

“…This was the main reason for choosing open source tools, as they offer full access to the source code, which helps in understanding the evaluation results. However, some tools were discarded such as Ardilla [28] and IPAAS [29] since they do not provide source code yet and TAP [30] which is a recent tool to detect vulnerability using deep learning.…”

Section: Selected Toolsmentioning

confidence: 99%

“…The table is divided into three grouped sets or rows; the first column of the table shows the category name, the second column shows the subject name, each subject includes a set of test cases. The number of test cases in each subject is showed in the third column, the rest of columns display the results of true positives (TP tests), false positives (FP tests) and the success percentage of the tool in each subject [30]. The total percentage is calculated by dividing the total number of passed tests by the total number of tests in a given category.…”

Section: Vulnerability Detection In Inter-benchmark Testsmentioning

confidence: 99%

Automated server-side model for recognition of security vulnerabilities in scripting languages

Abdel-Kader

Nashaat

Habib

et al. 2020

IJECE

View full text Add to dashboard Cite

With the increase of global accessibility of web applications, maintaining a reasonable security level for both user data and server resources has become an extremely challenging issue. Therefore, static code analysis systems can help web developers to reduce time and cost. In this paper, a new static analysis model is proposed. This model is designed to discover the security problems in scripting languages. The proposed model is implemented in a prototype SCAT, which is a static code analysis Tool. SCAT applies the phases of the proposed model to catch security vulnerabilities in PHP 5.3. Empirical results attest that the proposed prototype is feasible and is able to contribute to the security of real-world web applications. SCAT managed to detect 94% of security vulnerabilities found in the testing benchmarks; this clearly indicates that the proposed model is able to provide an effective solution to complicated web systems by offering benefits of securing private data for users and maintaining web application stability for web applications providers.

show abstract

TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology

Cited by 28 publications

References 19 publications

Deep security analysis of program code

Deep security analysis of program code

Cross-Site Scripting Guardian: A Static XSS Detector Based on Data Stream Input-Output Association Mining

Automated server-side model for recognition of security vulnerabilities in scripting languages

Contact Info

Product

Resources

About