The massive growth of the Internet of Things (IoT) as a network of interconnected entities [18], brings up new challenges in terms of privacy and security requirements to the traditional software engineering domain [4]. To protect the individualsâĂŹ privacy, the FTCâĂŹs Fair Information Practice Principles (FIPPs) [6] proposes to companies to give notice to the consumer about their data practices, provide them with choices and give them means to have control over their own data.. Using privacy policy is the most common way for this type of notices. However, privacy policies are not generally effective due to two main reasons: first, privacy policies are long and full of legal jargon which are not understandable by a normal user; second, it is not guaranteed that an IoT device behave as it is explained in its privacy policy. In this technical report, we propose and discuss our methodologies to analyze privacy policies. By the help of this analysis, we reduce the length of a privacy policy and make it organized based on privacy practices to improve understanding level for the user. We also come up with a method to find the inconsistencies between IoT devices and their privacy policies.
A privacy policy is a document that states how a company intends to handle and manage their customers' personal data. One of the problems that arises with these privacy policies is that their content might violate data privacy regulations. Because of the enormous number of privacy policies that exist, the only realistic way to check for legal inconsistencies in all of them is through an automated method. In this work, we use Natural Language Inference (NLI) techniques to compare privacy regulations against sections of privacy policies from a selection of large companies. Our NLI model uses pre-trained embeddings, along with BiLSTM in its attention mechanism. We tried two versions of our model: one that was trained on the Stanford Natural Language Inference (SNLI) and the second on the Multi-Genre Natural Language Inference (MNLI) dataset. We found that our test accuracy was higher on our model trained on the SNLI, but when actually doing NLI tasks on real world privacy policies, the model trained on MNLI generalized and performed much better.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.