Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We employ novel scoring strategies to identify the critical tokens that, if modified, cause the classifier to make an incorrect prediction. Simple character-level transformations are applied to the highest-ranked tokens in order to minimize the edit distance of the perturbation, yet change the original classification. We evaluated DeepWord-Bug on eight real-world text datasets, including text classification, sentiment analysis, and spam detection. We compare the result of DeepWordBug with two baselines: Random (Black-box) and Gradient (White-box). Our experimental results indicate that DeepWord-Bug reduces the prediction accuracy of current state-of-the-art deep-learning models, including a decrease of 68% on average for a Word-LSTM model and 48% on average for a Char-CNN model.
Today's production scale-out applications include many sub-application components, such as storage backends, logging infrastructure and AI models. These components have drastically different characteristics, are required to work in collaboration, and interface with each other as microservices. This leads to increasingly high complexity in developing, optimizing, configuring, and deploying scale-out applications, raising the barrier to entry for most individuals and small teams. We developed a novel co-designed runtime system, Jaseci, and programming language, Jac, which aims to reduce this complexity. The key design principle throughout Jaseci's design is to raise the level of abstraction by moving as much of the scale-out data management, microservice componentization, and live update complexity into the runtime stack to be automated and optimized automatically. We use real-world AI applications to demonstrate Jaseci's benefit for application performance and developer productivity.
Regression test prioritization is often performed in a time constrained execution environment in which testing only occurs for a fixed time period. For example, many organizations rely upon nightly building and regression testing of their applications every time source code changes are committed to a version control repository. This paper presents a regression test prioritization technique that uses a genetic algorithm to reorder test suites in light of testing time constraints. Experiment results indicate that our prioritization approach frequently yields higher average percentage of faults detected (APFD) values, for two case study applications, when basic block level coverage is used instead of method level coverage. The experiments also reveal fundamental trade-offs in the performance of time-aware prioritization. This paper shows that our prioritization technique is appropriate for many regression testing environments and explains how the baseline approach can be extended to operate in additional time constrained testing circumstances.
This paper presents a technique to select a representative set of test cases from a test suite that provides the same coverage as the entire test suite. This selection is performed by identifying, and then eliminating, the redundant and obsolete test cases in the test suite. The representative set replaces the original test suite and thus, potentially produces a smaller test suite. The representative set can also be used to identify those test cases that should be rerun to test the program after it has been changed. Our technique is independent of the testing methodology and only requires an association between a testing requirement and the test cases that satisfy the requirement. We illustrate the technique using the data flow testing methodology. The reduction that is possible with our technique is illustrated by experimental results.
After changes are made to a previously tested program, a goal of regression testing is to perform retesting based only on the modifcation while maintaining the same testing coverage as completely retesting the program. We present a novel approach to data flow based regression testing that uses slicing type algorithms to explicitly detect definition-use pairs that are affected by a program change. An important benefit of our slicing technique is, unlike previous techniques, no data flow history is needed nor is the recomputation of data flow for the entire program required to detect affected definition-use pairs. The program changes drive the recomputation of the required partial data flow through slicing. Another advantage is that our technique achieves the same testing coverage as a complete retest of the program without maintaining a test suite. Thus, the overhead of maintaining and updating a test suite is eliminated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.