A New Software Birthmark based on Weight Sequences of Dynamic Control Flow Graph for Plagiarism Detection

Yuan, Baoguo; Wang, Junfeng; Fang, Zhiyang; Li, Qi

doi:10.1093/comjnl/bxy055

Cited by 5 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We call the features that we can obtain from the semantic analysis phase (S3) semantic features. To obtain semantic features, a complex analysis, such as symbolic execution [7], [8], [15], [18], [63], dynamic evaluation of code snippets [8], [30], [31], [33], [35], [63], [64], [66], [67], or machine learningbased embedding [12], [13], [19], [20], [21], [23], [24], [25], [26], [27], [28], [29] is necessary. There are mainly seven distinct semantic features used in the previous literature, as listed in Table 1.…”

Section: Semantic Featuresmentioning

confidence: 99%

“…Third, the runtime behavior of a code snippet can directly express its semantics, as presented by traditional malware analysis [81]. By executing two target functions with the same execution environment, one can directly compare the executed instruction sequences [64] or visited CFG edges of the target functions [66]. For comparison, one may focus on specific behaviors observed during the execution [18], [28], [30], [31], [35], [67], [82]: the read/write values of stack and heap memory, return values from function calls, and invoked system/library function calls during the executions.…”

Section: Semantic Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned

Kim

Kil

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Binary code similarity analysis (BCSA) is widely used for diverse security applications, including plagiarism detection, software license violation detection, and vulnerability discovery. Despite the surging research interest in BCSA, it is significantly challenging to perform new research in this field for several reasons. First, most existing approaches focus only on the end results, namely, increasing the success rate of BCSA, by adopting uninterpretable machine learning. Moreover, they utilize their own benchmark, sharing neither the source code nor the entire dataset. Finally, researchers often use different terminologies or even use the same technique without citing the previous literature properly, which makes it difficult to reproduce or extend previous work. To address these problems, we take a step back from the mainstream and contemplate fundamental research questions for BCSA. Why does a certain technique or a certain feature show better results than the others? Specifically, we conduct the first systematic study on the basic features used in BCSA by leveraging interpretable feature engineering on a large-scale benchmark. Our study reveals various useful insights on BCSA. For example, we show that a simple interpretable model with a few basic features can achieve a comparable result to that of recent deep learning-based approaches. Furthermore, we show that the way we compile binaries or the correctness of underlying binary analysis tools can significantly affect the performance of BCSA. Lastly, we make all our source code and benchmark public and suggest future directions in this field to help further research.

show abstract

Section: Semantic Featuresmentioning

confidence: 99%

Section: Semantic Featuresmentioning

confidence: 99%

Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned

Kim

Kil

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Um exemplo de método baseado nessa estratégia é apresentado em [Wan et al 2018], que usa uma técnica de estimativa de similaridade chamada simhash [Charikar 2002] para detectar plágio em códigos Verilog HDL. Marcas de nascenc ¸a, ou birthmarks, são características de códigos-fonte que são muito resistentes a tentativas de ofuscac ¸ão, onde uma série de mudanc ¸as são feitas nos códigos para tentar encobrir a prática de plágio [Yuan et al 2018, Tian et al 2015. Os métodos baseados nessas características procuram detectar ocorrências de plágio através da análise de similaridade entre as marcas de nascenc ¸a presentes em pares de códigos.…”

Section: Trabalhos Relacionadosunclassified

Um Método de Detecção de Plágio para Sistemas Juiz On-line baseado no Comportamento dos Alunos

Oliveira

Filho

Oliveira

et al. 2021

Anais Do XXXII Simpósio Brasileiro De Informática Na Educação (SBIE 2021)

View full text Add to dashboard Cite

A prática do plágio é um problema grave e crescente no meio acadêmico, que interfere diretamente na qualidade do ensino. Esta pesquisa se contextualiza no problema da detecção de plágio entre códigos-fonte nas disciplinas de introdução de programação. Nessas disciplinas, os códigos desenvolvidos pelos alunos tendem a ser simples e pequenos, dificultando o processo de detecção de plágio por parte dos métodos tradicionais baseados em similaridade de código. Para contornar essa dificuldade, neste trabalho é proposto um método de detecção de plágio baseado em evidências extraídas de registros de logs de sistemas juízes online, onde as evidências estão relacionadas com o comportamento dos alunos durante suas tentativas de resolução dos exercícios de programação. Como resultado, o método proposto alcançou 0.83 na medida-F durante o processo de detecção de plágio.

show abstract

Advanced Persistent Threat intelligent profiling technique: A survey

Tang

Wang

et al. 2022

Computers and Electrical Engineering

View full text Add to dashboard Cite

A New Software Birthmark based on Weight Sequences of Dynamic Control Flow Graph for Plagiarism Detection

Cited by 5 publications

References 9 publications

Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned

Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned

Um Método de Detecção de Plágio para Sistemas Juiz On-line baseado no Comportamento dos Alunos

Advanced Persistent Threat intelligent profiling technique: A survey

Contact Info

Product

Resources

About