2020
DOI: 10.5455/jjcit.71-1597824949
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Deep Features Learning for Vulnerability Detection Using Character N-Gram Embedding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Feature Types Under Text-based Code Representation. In the context of text-based code representation, several feature types have been identified that can influence the model's behavior when processing source code, including token type [31,48], token length [50], token frequency [49], token n-grams [51], token lexical patterns [52], and token attention values [53]. Token types could be categorized as comments and code.…”
Section: Taxonomy Of Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Feature Types Under Text-based Code Representation. In the context of text-based code representation, several feature types have been identified that can influence the model's behavior when processing source code, including token type [31,48], token length [50], token frequency [49], token n-grams [51], token lexical patterns [52], and token attention values [53]. Token types could be categorized as comments and code.…”
Section: Taxonomy Of Related Workmentioning
confidence: 99%
“…Zeng et al [49] concluded that a better model performance is achieved when preserving code frequency information. Token n-grams are fixed-size contiguous sequences of tokens that capture local context within a fixed window [51], but their effectiveness may be limited for longer code sequences and transformer models. By representing recurring structures in the code [52] token lexical patterns can help understand the code's basic logic and structure.…”
Section: Taxonomy Of Related Workmentioning
confidence: 99%
“…Code gadget is in turn generated using data flow analysis of the source code. [68] The system uses deep learning to extract the features and use deep learning algorithms to for prediction of vulnerabilities. The work uses C/ C++ source codes which is tokenized at slice level.…”
Section: ) Review Of Related Workmentioning
confidence: 99%
“…The two most common vulnerabilities considered from dataset NVD and SARD for the research works, [34] and [65] are buffer overflow identified with CWE-119 and resource management error (CWE-399). [68], [33], [59] uses collected and labelled C/C++ program source code from [45], [43] which is publicly available in SySeVR dataset [54]. The collected instances contain labelled instances with vulnerability type such as Function call (FC), Array Usage (AU), Pointer Usage (PU), Arithmetic Expression (AE).…”
Section: Review Of Dataset Considered For Targeted Vulnerabilitiesmentioning
confidence: 99%
“…The application of ECS can effectively complete relevant online operations and improve the convenience of people's daily life. Due to the special computing mode and impact of ECS, there are some potential risk elements [4][5][6][7] during its operation, which affect user information security and are not conducive to the optimization and upgrade of ECS. Therefore, vulnerability detection needs to be effectively conducted to reduce the risk of dual end operation of ECS.…”
Section: Introductionmentioning
confidence: 99%