Most modern data processing pipelines run on top of a distributed storage layer, and securing the whole system, and the storage layer in particular, against accidental or malicious misuse is crucial to ensuring compliance to rules and regulations. Enforcing data protection and privacy rules, however, stands at odds with the requirement to achieve higher and higher access bandwidths and processing rates in large data processing pipelines. In this work we describe our proposal for the path forward that reconciles the two goals. We call our approach "Software-Defined Data Protection" (SDP). Its premise is simple, yet powerful: decoupling often changing policies from request-level enforcement allows distributed smart storage nodes to implement the latter at line-rate. Existing and future data protection frameworks can be translated to the same hardware interface which allows storage nodes to offload enforcement efficiently both for company-specific rules and regulations, such as GDPR or CCPA. While SDP is a promising approach, there are several remaining challenges to making this vision reality. As we explain in the paper, overcoming these will require collaboration across several domains, including security, databases and specialized hardware design.
Enforcing data protection and privacy rules within large data processing applications is becoming increasingly important, especially in the light of GDPR and similar regulatory frameworks. Most modern data processing happens on top of a distributed storage layer, and securing this layer against accidental or malicious misuse is crucial to ensuring global privacy guarantees. However, the performance overhead and the additional complexity for this is often assumed to be significant -in this work we describe a path forward that tackles both challenges. We propose "Software-Defined Data Protection" (SDP), an adoption of the "Software-Defined Storage" approach to non-performance aspects: a trusted controller translates company and application-specific policies to a set of rules deployed on the storage nodes. These, in turn, apply the rules at line-rate but do not take any decisions on their own. Such an approach decouples often changing policies from request-level enforcement and allows storage nodes to implement the latter more efficiently.Even though in-storage processing brings challenges, mainly because it can jeopardize line-rate processing, we argue that today's Smart Storage solutions can already implement the required functionality, thanks to the separation of concerns introduced by SDP. We highlight the challenges that remain, especially that of trusting the storage nodes. These need to be tackled before we can reach widespread adoption in cloud environments.
No abstract
We present CrashMonkey and Ace, a set of tools to systematically find crash-consistency bugs in Linux file systems. CrashMonkey is a record-and-replay framework which tests a given workload on the target file system by simulating power-loss crashes while the workload is being executed, and checking if the file system recovers to a correct state after each crash. Ace automatically generates all the workloads to be run on the target file system. We build CrashMonkey and Ace based on a new approach to test file-system crash consistency: bounded black-box crash testing (B 3). B 3 tests the file system in a black-box manner using workloads of file-system operations. Since the space of possible workloads is infinite, B 3 bounds this space based on parameters such as the number of file-system operations or which operations to include, and exhaustively generates workloads within this bounded space. B 3 builds upon insights derived from our study of crashconsistency bugs reported in Linux file systems in the last 5 years. We observed that most reported bugs can be reproduced using small workloads of three or fewer file-system operations on a newly created file system, and that all reported bugs result from crashes after fsync()-related system calls. CrashMonkey and Ace are able to find 24 out of the 26 crash-consistency bugs reported in the last 5 years. Our tools also revealed 10 new crash-consistency bugs in widely used, mature Linux file systems, 7 of which existed in the kernel since 2014. Additionally, our tools found a crash-consistency bug in a verified file system, FSCQ. The new bugs result in severe consequences like broken rename atomicity, loss of persisted files and directories, and data loss.
We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Compared to a state-of-the-art DPM key-value store, Dinomo achieves at least 3.8X better throughput at scale on various workloads and higher scalability, while providing fast reconfiguration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.