MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. We present Casper, a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting realworld, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2× faster compared to the original.We implemented Casper using the Polyglot framework [37] to parse Java code into an abstract syntax tree (AST). Casper traverses the program AST to identify candidate code fragments, performs program analysis, and generates target code. We now describe the Java features supported by our compiler front-end. We also discuss how Casper identifies code fragments for translation and generates executable code from the verified program summary.
The Industrial Internet of Things (I-IoT) is a manifestation of an extensive industrial network that interconnects various sensors and wireless devices to integrate cyber and physical systems. While I-IoT provides a considerable advantage to large-scale industrial enterprises, it is prone to significant security challenges in the form of sophisticated attacks such as Advanced Persistent Threat (APT). APT is a serious security challenge to all kinds of networks, including I-IoT. It is a stealthy threat actor, characteristically a nation-state or state-sponsored group that launches a cyber attack intending to gain unauthorized access to a computer network and remain undetected for a longer period. The latest intrusion detection systems face several challenges in detecting such complex cyber attacks in multifarious networks of I-IoT, where unpredictable and unexpected cyber attacks of such sophistication can lead to catastrophic effects. Therefore, these attacks need to be accurately and promptly detected in I-IoT. This paper presents an intelligent APT detection and classification system to secure I-IoT. After pre-processing, several machine learning algorithms are applied to detect and classify complex APT signatures accurately. The algorithms include Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, Gaussian Naive Bayes, Bagging, Extreme Gradient Boosting and Adaboost, which are applied on a publicly available dataset KDDCup99. Moreover, a comparative analysis is conducted among ML algorithms to select the appropriate one for the targeted domain. The experimental results indicate that the Adaboost classifier outperforms the others with 99.9% accuracy with 0.012 s execution time for detecting APT attacks. Furthermore, results are compared with state-of-the-art techniques that depict the superiority of the proposed system. This system can be deployed in mission-critical scenarios in the I-IoT domain.
Parallelizing of software improves its effectiveness and productivity. To guarantee correctness, the parallel and serial versions of the same code must be formally verified to be equivalent. We present a novel approach, called GRASSP, that automatically synthesizes parallel single-pass array-processing programs by treating the given serial versions as specifications. Given arbitrary segmentation of the input array, GRASSP synthesizes a code to determine a new segmentation of the array that allows computing partial results for each segment and merging them. In contrast to other parallelizers, GRASSP gradually considers several parallelization scenarios and certifies the results using constrained Horn solving. For several classes of programs, we show that such parallelization can be performed efficiently. The C++ translations of the GRASSP solutions sped performance by up to 5X relative to serial code on an 8-thread machine and Hadoop translations by up to 10X on a 10-node Amazon EMR cluster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.