The upgrade of the electricity network to the "smart grid" has been intensified in the last years. The new automated devices being deployed gather large quantities of data that offer promises of a more resilient grid but also raise privacy concerns among customers and energy distributors.In this paper, we focus on the energy consumption traces that smart meters generate and especially on the risk of being able to identify individual customers given a large dataset of these traces. This is a question raised in the related literature and an important privacy research topic. We present an overview of the current research regarding privacy in the Advanced Metering Infrastructure. We make a formalization of the problem of de-anonymization by matching low-frequency and high-frequency smart metering datasets and we also build a threat model related to this problem. Finally, we investigate the characteristics of these datasets in order to make them more resilient to the de-anonymization process.Our methodology can be used by electricity companies to better understand the properties of their smart metering datasets and the conditions under which such datasets can be released to third parties.
In the transition to the smart grid, the electricity networks are becoming more data intensive with more data producing devices deployed, increasing both the opportunities and challenges in how the collected data are used. For example, in the Advanced Metering Infrastructure (AMI) the devices and their corresponding data give more information about the operational parameters of the environment but also details about the habits of the people living in the houses monitored by smart meters. Different anonymization techniques have been proposed to minimize privacy concerns, among them the use of pseudonyms. In this work we return to the question of the effectiveness of pseudonyms, by investigating how a previously reported methodology for depseudonymization performs given a more realistic and larger dataset than was previously used. We also propose and compare the results with our own simpler de-pseudonymization methodology.Our results indicate, not surprisingly, that large realistic datasets are very important to properly understand how an experimental method performs. Results based on small datasets run the risk of not being generalizable. In particular, we show that the number of re-identified households by breaking pseudonyms is dependent on the size of the dataset and the period where the pseudonyms are constant and not changed. In the setting of the smart grid, results will even vary based on the season when the dataset was captured. Knowing that relative simple changes in the data collection procedure may significantly increase the resistance to deanonymization attacks will help future AMI deployments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.