With the advent of big data, modern businesses face an increasing need to store and process large volumes of sensitive customer information on the cloud. In these environments, resources are shared across a multitude of mutually untrusting tenants increasing propensity for data leakage. With the recent spate of high-profile data exfiltration attacks and the emergence of critical vulnerabilities such as Heartbleed and Shellshock, coupled with increasing use of clouds in all aspects of our daily lives, this problem stands to grow further in severity. In this thesis, we present a novel network-based covert channel that can arise in the context of shared network resources in data-center environments even in the presence of network monitors regulating flow destinations with NAC policies and VLAN-based isolation mechanisms. Through a series of experiments on diverse network hardware (including SDNs) and commercial clouds such as EC2 and Azure, we demonstrate that our network-based channel achieves orders of magnitude greater bit rates than reported in any recent literature. Furthermore, we present an information-theoretic framework to model and study the channel. Using this model we derive an upper bound on the information rate of the channel and propose a coding scheme that nearly achieves this upper bound. Additionally we introduce some techniques to make the covert channel robust to noise, and empirically study its performance in the presence of realistic crosstraffic. Finally, we discuss several avenues for mitigation, and demonstrate the effectiveness of our schemes both empirically and mathematically.
Abstract-Large data processing tasks can be effected using workflow management systems. When either the input data or the programs in the pipeline are modified, the workflow must be re-executed to ensure that the final output data is updated to reflect the changes. Since such re-computation can consume substantial resources, optimizing the system to avoid redundant computation is desirable. In the case of a workflow, the dependency relationships between files are specified at the outset and can be leveraged to track which programs need to be re-executed when particular files change. Current distributed systems cannot provide such functionality when no predefined workflows exist. In this paper, we present an architecture that provides functionality to produce both correct output as well as fast re-execution by leveraging the provenance of data to propagate changes along an implicit dependency graph. We explore the tradeoff between storage and availability by presenting a performance analysis of our rollback and re-execution scheme.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.