IR2V
            <scp>EC</scp>

VenkataKeerthy, S.; Aggarwal, Rohit; Jain, Shalini; Desarkar, Maunendra Sankar; Upadrasta, Ramakrishna; Srikant, Y. N.

doi:10.1145/3418463

Cited by 45 publications

(7 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Machine Learning models. We also note that more or less computation-intensive ML models are used for prediction, ranging from simple decision trees or support vector machine [39], [48], [49], to complex deep learning methods [41], [45]. Multiple models can also be considered as illustrated by Roy et al [38].…”

Section: Discussionmentioning

confidence: 99%

Optimizing performance and energy across problem sizes through a search space exploration and machine learning

Scravaglieri

Popov

Pilla

et al. 2023

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Optimizing performance and energy across problem sizes through a search space exploration and machine learning

Scravaglieri

Popov

Pilla

et al. 2023

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

“…They can match or even surpass advanced methods using only simple LSTM [21] and pre-trained embeddings. Venkatakeerthy et al [22] provided IR2Vec, a concise and scalable encoding infrastructure to represent programs as distributed embeddings in continuous space. This method takes symbolic and flow-aware embeddings to construct LLVM Entities and map them to real-valued distributed embeddings.…”

Section: Intermediate Representationmentioning

confidence: 99%

DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network

Tang¹,

Shan²,

Zhang³

et al. 2022

Electronics

View full text Add to dashboard Cite

With the improvement of software copyright protection awareness, code obfuscation technology plays a crucial role in protecting key code segments. As the obfuscation technology becomes more and more complex and diverse, it has spawned a large number of malware variants, which make it easy to evade the detection of anti-virus software. Malicious code detection mainly depends on binary code similarity analysis. However, the existing software analysis technologies are difficult to deal with the growing complex obfuscation technologies. To solve this problem, this paper proposes a new obfuscation-resilient program analysis method, which is based on the data flow transformation relationship of the intermediate representation and the graph network model. In our approach, we first construct the data transformation graph based on LLVM IR. Then, we design a novel intermediate language representation model based on graph networks, named DFSGraph, to learn the data flow semantics from DTG. DFSGraph can detect the similarity of obfuscated code by extracting the semantic information of program data flow without deobfuscation. Extensive experiments prove that our approach is more accurate than existing deobfuscation tools when searching for similar functions from obfuscated code.

show abstract

“…One solution is representing code in different languages with a uniform compiler-generated intermediate representation (IR). The model could be trained on IR (Ben-Nun, Jakobovits, and Hoefler 2018;VenkataKeerthy et al 2020) rather than on the source code, allowing it to learn common patterns across different languages. However, obtaining IR for different programming languages requires intensive domain expertise and engineering efforts to fix compilation errors, making it infeasible for language extension.…”

Section: Introductionmentioning

confidence: 99%

AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection

Du,

Ma,

et al. 2024

AAAI

View full text Add to dashboard Cite

Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constraints. To address these issues, we present AdaCCD, a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language. AdaCCD leverages language-agnostic code representations from pre-trained programming language models and propose an Adaptively Refined Contrastive Learning framework to transfer knowledge from resource-rich languages to resource-poor languages. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages. AdaCCD achieves significant improvements over other baselines, and achieve comparable performance to supervised fine-tuning.

show abstract

IR2V EC

Cited by 45 publications

References 41 publications

Optimizing performance and energy across problem sizes through a search space exploration and machine learning

Optimizing performance and energy across problem sizes through a search space exploration and machine learning

DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network

AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection

Contact Info

Product

Resources

About