Being able to identify records that correspond to the same entity across diverse databases is an increasingly important step in many data analytics projects. Research into privacy-preserving record linkage (PPRL) aims to develop techniques that can link records across databases such that besides the record pairs classified as matches no sensitive information about the entities in these databases is revealed. A popular technique used in PPRL is to encode sensitive values into Bloom filters (bit vectors), which has the advantage of allowing approximate matching using character q-grams. PPRL based on Bloom filter encoding has shown to be accurate and scalable to large databases, and is thus now being used in real-world PPRL systems in Australia, Canada and the UK. However, recent studies have shown that Bloom filters used for PPRL are vulnerable to cryptanalysis attacks that can re-identify some of the sensitive values encoded in these Bloom filters. While previous such attack methods were slow and required knowledge of various encoding parameters, we present a novel efficient attack which exploits how attribute values are encoded into Bloom filters. Our attack method does not require knowledge of the encoding function or its parameter settings used. It is able to correctly re-identify with high precision q-grams that could not have been hashed to certain Bloom filter bit positions, and using these re-identified q-grams it can then re-identify attribute values with high precision. Our method is significantly faster than earlier PPRL cryptanalysis attacks, and in our experimental evaluation it is able to successfully re-identify attribute values from large real-world databases in a few minutes.
Modern power systems depend on Cyber-Physical Systems (CPSs) to link physical devices and control technologies. A major concern in the implementation of smart power networks is to minimize the risk of data privacy violation (e.g., by adversaries using data poisoning and inference attacks). In this paper, we propose a privacy-preserving framework to achieve both privacy and security in smart power networks. The framework includes two main modules, namely: a two-level privacy module and an anomaly detection module. In the two-level privacy module, an enhanced Proof of Work (ePoW) technique based blockchain is designed to verify data integrity and mitigate data poisoning attacks, and a Variational AutoEncoder (VAE) is simultaneously applied for transforming data into an encoded format for preventing inference attacks. In the anomaly detection module, a Long Short Term Memory (LSTM) deep learning technique is used for training and validating the outputs of the two-level privacy module using two public datasets. The results highlight that the proposed framework can efficiently protect data of smart power networks and discover abnormal behaviors, in comparison to several state-of-the-art techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.