Network coding approaches typically consider an unrestricted recoding of coded packets in the relay nodes to increase performance. However, this can expose the system to pollution attacks that cannot be detected during transmission, until the receivers attempt to recover the data. To prevent these attacks while allowing for the benefits of coding in mesh networks, the cache coding protocol was proposed. This protocol only allows recoding at the relays when the relay has received enough coded packets to decode an entire generation of packets. At that point, the relay node recodes and signs the recoded packets with its own private key, allowing the system to detect and minimize the effect of pollution attacks and making the relays accountable for changes on the data. This paper analyzes the delay performance of cache coding to understand the security-performance trade-off of this scheme. We introduce an analytical model for the case of two relays in an erasure channel relying on an absorbing Markov chain and an approximate model to estimate the performance in terms of the number of transmissions before successfully decoding at the receiver. We confirm our analysis using simulation results. We show that cache coding can overcome the security issues of unrestricted recoding with only a moderate decrease in system performance.
This paper proposes Yggdrasil, a protocol for privacy-aware dual data deduplication in multi client settings. Yggdrasil is designed to reduce the cloud storage space while safeguarding the privacy of the client's outsourced data. Yggdrasil combines three innovative tools to achieve this goal. First, generalized deduplication, an emerging technique to reduce data footprint. Second, non-deterministic transformations that are described compactly and improve the degree of data compression in the Cloud (across users). Third, data preprocessing in the clients in the form of lightweight, privacy-driven transformations prior to upload. This guarantees that an honest-but-curious Cloud service trying to retrieve the client's actual data will face a high degree of uncertainty as to what the original data is. We provide a mathematical analysis of the measure of uncertainty as well as the compression potential of our protocol. Our experiments with a HDFS log data set shows that 49 % overall compression can be achieved, with clients storing only 12 % for privacy and the Cloud storing the rest. This is achieved while ensuring that each fragment uploaded to the Cloud would have 10 293 possible original strings from the client. Higher uncertainty is possible, with some reduction of compression potential.
Cloud Service Providers (CSPs) offer a vast amount of storage space at competitive prices to cope with the growing demand for digital data storage. Dual deduplication is a recent framework designed to improve data compression on the CSP while keeping clients' data private from the CSP. To achieve this, clients perform lightweight information-theoretic transformations to their data prior to upload. We investigate the effectiveness of dual deduplication, and propose an improvement for the existing state-of-the-art method. We name our proposal Bonsai as it aims at reducing storage fingerprint and improving scalability. In detail, Bonsai achieves (1) significant reduction in client storage, (2) reduction in total required storage (client + CSP), and (3) reducing the deduplication time on the CSP. Our experiments show that Bonsai achieves compression rates of 68% on the cloud and 5% on the client, while allowing the cloud to identify deduplications in a time-efficient manner. We also show that combining our method with universal compressors in the cloud, e.g., Brotli, can yield better overall compression on the data compared to only applying the universal compressor or plain Bonsai. Finally, we show that Bonsai and its variants provide sufficient privacy against an honest-but-curious CPS that knows the distribution of the Clients' original data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.