Much research on software engineering relies on experimental studies based on fault injection. Fault injection, however, is not often relevant to emulate real-world software faults since it “blindly” injects large numbers of faults. It remains indeed challenging to inject few but realistic faults that target a particular functionality in a program. In this work, we introduce iBiR , a fault injection tool that addresses this challenge by exploring change patterns associated to user-reported faults. To inject realistic faults, we create mutants by re-targeting a bug report driven automated program repair system, i.e., reversing its code transformation templates. iBiR is further appealing in practice since it requires deep knowledge of neither code nor tests, but just of the program’s relevant bug reports. Thus, our approach focuses the fault injection on the feature targeted by the bug report. We assess iBiR by considering the Defects4J dataset. Experimental results show that our approach outperforms the fault injection performed by traditional mutation testing in terms of semantic similarity with the original bug, when applied at either system or class levels of granularity, and provides better, statistically significant, estimations of test effectiveness (fault detection). Additionally, when injecting 100 faults, iBiR injects faults that couple with the real ones in around 36% of the cases, while mutation testing achieves less than 4% .
Current Java Virtual Machine (JVM) fuzzers aim at generating syntactically valid Java programs, without targeting any particular use of the standard Java library. While effective, such fuzzers fail to discover specific kinds of bugs or vulnerabilities, such as type confusion, that are related to the standard API usage. To deal with this issue, we introduce a mutation-based feedback-guided black-box JVM fuzzer, called CONFUZZION. CONFUZZION, as the name suggests, targets security-relevant object-oriented flaws with a particular focus on type confusion vulnerabilities. We show that in less than 4 hours, on commodity hardware and without any predefined initialization seed, CONFUZZION automatically generates Java programs that reveal JVM vulnerabilities, i.e., the Common Vulnerabilities and Exposures CVE-2017-3272. We also show that state-of-the-art fuzzers or even traditional automatic testing techniques are not capable of detecting such faults, even after 48 hours of execution in the same environment. To the best of our knowledge, CONFUZZION is the first fuzzer able to detect JVM type confusion vulnerabilities.
Fault seeding is typically used in controlled studies to evaluate and compare test techniques. Central to these techniques lies the hypothesis that artificially seeded faults involve some form of realistic properties and thus provide realistic experimental results. In an attempt to strengthen realism, a recent line of research uses advanced machine learning techniques, such as deep learning and Natural Language Processing (NLP), to seed faults that look like (syntactically) real ones, implying that fault realism is related to syntactic similarity. This raises the question of whether seeding syntactically similar faults indeed results in semantically similar faults and more generally whether syntactically dissimilar faults are far away (semantically) from the real ones. We answer this question by employing 4 fault-seeding techniques (PiTest -a popular mutation testing tool, IBIR -a tool with manually crafted fault patterns, DeepMutation -a learning-based fault seeded framework and CodeBERT -a novel mutation testing tool that use code embeddings) and demonstrate that syntactic similarity does not reflect semantic similarity. We also show that 60%, 47%, 43% and 7% of the real faults of Defects4J V2 are semantically resembled by CodeBERT, PiTest, IBIR and DeepMutation faults. We then perform an objective comparison between the techniques and find that CodeBERT and PiTest have similar fault detection capabilities that subsume IBIR and DeepMutation, and that IBIR is the most cost-effective technique. Moreover, the overall fault detection of PiTest, CodeBERT, IBIR and DeepMutation was, on average, 54%, 53%, 37% and 7%.
In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We first (1) reproduce the results of MalBERT using BERT models learning with Android application's manifests obtained from 265k applications (vs. 22k for MalBERT) from the AndroZoo dataset in order to detect malware. The results of the MalBERT paper are excellent and hard to believe as a manifest only roughly represents an application, we therefore try to answer the following questions in this paper. Are the experiments from MalBERT reproducible? How important are Permissions for malware detection? Is it possible to keep or improve the results by reducing the size of the manifests? We then (2) investigate if BERT can be used to classify Android malware into families. The results show that BERT can successfully differentiate malware/goodware with 97% accuracy. Furthermore BERT can classify malware families with 93% accuracy. We also demonstrate that Android permissions are not what allows BERT to successfully classify and even that it does not actually need it. IntroductionAndroid malware are malicious applications aiming at attacking the end-users' devices, data, money, software or third party applications and services [5]. With the democratization of smartphones, virtually everyone nowadays carries everyday a device that can access, store, and manipulate sensitive and private data. Android, being the most used smartphone operating system, is a target of choice for attackers, who create malicious applications that aim to obtain financial gains from often unsuspecting users.In fact, new Malware are constantly being released [19], causing a constant threat and challenge for the users, the application-markets maintainers, and the security researchers.Consequently, much effort and resources are spent to develop approaches that are able to automatically detect Malware in the unstopping flow of new applications. This includes detection approaches at the app store level such as Google PlayStore [2], or at the device level via anti-viruses [5]. Practitioners and researchers are in a constant race with the load of appearing Malware, thus, trying to detect not only previously identified Malware but also new ones. For this purpose, they propose approaches that classify the applications into Malware or not depending on relevant suspiciousness-related components appearing in the applications. Those approaches are classified into two main categories: static and dynamic analysis techniques. The approaches based on static analysis aim at identifying Malware by parsing and evaluating the syntax of the application while the dynamic-based approaches extract information about application by instrumenting and running them in order to capture any eventual malicious/suspicious behavior of the application through its execution. Additionally, a third approach category -a hybrid one -consists of combining both static and dynamic analysis, in the hope of obtaining more and better information that could be leveraged...
Mutation testing is an established fault-based testing technique. It operates by seeding faults into the programs under test and asking developers to write tests that reveal these faults. These tests have the potential to reveal a large number of faults -those that couple with the seeded ones -and thus are deemed important. To this end, mutation testing should seed faults that are both "natural" in a sense easily understood by developers and strong (have high chances to reveal faults). To achieve this we propose using pre-trained generative language models (i.e. CodeBERT) that have the ability to produce developer-like code that operates similarly, but not exactly, as the target code. This means that the models have the ability to seed natural faults, thereby offering opportunities to perform mutation testing. We realise this idea by implementing µBERT, a mutation testing technique that performs mutation testing using CodeBert and empirically evaluated it using 689 faulty program versions. Our results show that the fault revelation ability of µBERT is higher than that of a state-of-the-art mutation testing (PiTest), yielding tests that have up to 17% higher fault detection potential than that of PiTest. Moreover, we observe that µBERT can complement PiTest, being able to detect 47 bugs missed by PiTest, while at the same time, PiTest can find 13 bugs missed by µBERT.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.