Santanu Kumar Dash scite author profile

Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference dataset from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mis-labeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently mis-classified due to conceptual errors made during labeling processes. In this paper, we mine Anti-Virus labels and analyze the associations between all labels given by different vendors to systematically unify common samples into family groups. The key novelty of our approach, named EU-PHONY [20], is that no a-priori knowledge on malware families is needed. We evaluate EUPHONY using reference datasets and more than 400 thousands additional samples outside of these datasets. Results show that EUPHONY can accurately label malware with a fine-grained clustering of families, while providing competitive performance against the state-of-the-art.

show abstract

DroidSieve

Suárez-Tangil

Dash

Ahmadi

et al. 2017

156

View full text Add to dashboard Cite

show abstract

A theory of dual channel constraints

Casalnuovo

Barr

Dash

et al. 2020

View full text Add to dashboard Cite

The surprising predictability of source code has triggered a boom in tools using language models for code. Code is much more predictable than natural language, but the reasons are not well understood. We propose a dual channel view of code; code combines a formal channel for specifying execution and a natural language channel in the form of identifiers and comments that assists human comprehension. Computers ignore the natural language channel, but developers read both and, when writing code for longterm use and maintenance, consider each channel's audience: computer and human. As developers hold both channels in mind when coding, we posit that the two channels interact and constrain each other; we call these dual channel constraints. Their impact has been neglected. We describe how they can lead to humans writing code in a way more predictable than natural language, highlight pioneering research that has implicitly or explicitly used parts of this theory, and drive new research, such as systematically searching for cross-channel inconsistencies. Dual channel constraints provide an exciting opportunity as truly multidisciplinary research; for computer scientists they promise improvements to program analysis via a more holistic approach to code, and to psycholinguists they promise a novel environment for studying linguistic processes.

show abstract

A scalable approach to computing representative lowest common ancestor in directed acyclic graphs

Dash

Scholz

Herhut

et al. 2013

Theoretical Computer Science

View full text Add to dashboard Cite

LCA computation for vertex pairs in trees can be achieved in constant time after linear-time preprocessing. However, extension of these techniques to compute LCA for vertex-pairs in DAGs has been not possible due to the non-tree edges in a DAG. In this paper, we present an algorithm for computing the LCA for vertex pairs in a DAG which treats the DAG??s spanning tree and its non-tree edges separately. Our approach enables us to tap the efficiency of existing LCA algorithms for trees. Furthermore, our algorithm decomposes the DAG into a set of component trees called clusters which significantly reduces the preprocessing necessary to incorporate non-tree edges in the LCA computation. Our algorithm seamlessly interpolates the performance graph between the best reported algorithms for trees and the best reported algorithms for DAGs depending on the incidence of non-tree edges in the DAG. Using the proposed techniques, it is possible to achieve near-linear preprocessing and constant query time for sparse DAG

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.