Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021) 2021
DOI: 10.18653/v1/2021.nlp4prog-1.7
|View full text |Cite
|
Sign up to set email alerts
|

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

Abstract: We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…BLEU is used to evaluate code generation systems since many prior works in code generation formulated the problem as a machine translation problem of translating English to code snippets (e.g. (Liguori et al, 2021a)). Both exact match and averaged token level BLEU scores have been extensively used in evaluating code generation models (Liguori et al, 2021a,b;Oda et al, 2015b;Ling et al, 2016;Gemmell et al, 2020).…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…BLEU is used to evaluate code generation systems since many prior works in code generation formulated the problem as a machine translation problem of translating English to code snippets (e.g. (Liguori et al, 2021a)). Both exact match and averaged token level BLEU scores have been extensively used in evaluating code generation models (Liguori et al, 2021a,b;Oda et al, 2015b;Ling et al, 2016;Gemmell et al, 2020).…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, code generation enables users to build code more effectively and efficiently and enhance the overall software engineering process. Code software engineering (Miltner et al, 2019;, robotics (Kuhlmann et al, 2004), and cyber-security (You et al, 2017;Liguori et al, 2021a;Frempong et al, 2021; (a) An assembly code generation task. The task is to generate the assembly code that is then compiled into shellcode (small pieces of code used as a payload to exploit software vulnerabilities) using the natural language descriptions on the right.…”
Section: Overviewmentioning
confidence: 99%
See 3 more Smart Citations