Privacy policies outline data collection and sharing practices followed by an organization, together with choice and control measures available to users to manage the process. However, users have often needed help reading and understanding such documents, regardless of their being written in a natural language. The fundamental problems with privacy policies persist despite advancements in privacy design, frameworks, and regulations. To identify the causes of privacy policies being persistently challenging to comprehend, it is vital to investigate historical policy patterns and understand the evolution of privacy policies concerning information packaging and presentation. To this aid, we create a sentence-level classifier to conduct a large-scale longitudinal analysis on different privacy policies from 130,604 organizations, totaling approximately one million policies from 1997 to 2019. We annotate 10,717 sentences from 115 policies in the OPP-115 corpus to implement the classifier and then use those annotations to train the XLNet and BERT classifiers. Results from our analysis reveal that specific data practice categories experience more frequent policy changes than others, making it challenging to track relevant information over time. In addition, we discover that every category has distinct composition, readability, and structural issues, which exacerbate when categories frequently co-occur in a document. Based on our observations, we provide recommendations for policy articulation and revision to make privacy policy documents conform to better coherence and structure.
Websites are used regularly in our day-to-day lives, yet research has shown that it is challenging for many users to use them securely, e.g., most prominently due to weak passwords through which they access their accounts. At the same time, many services employ low-security measures, making their users even more prone to account compromises with little to no means of remediating compromised accounts. Additionally, remediating compromised accounts requires users to complete a series of steps, ideally all provided and explained by the service. However, for U.S.-based websites, prior research has shown that the advice provided by many services is often incomplete. To further understand the underlying issue and its implications, this paper reports on a study that analyzes the account remediation procedure covering the 50 most popular websites in 30 countries, 6 each in Africa, the Americas, Asia, Europe, and Oceania. We conducted the first transcontinental analysis on the account remediation protocols of popular websites. The analysis is based on 5 steps websites need to provide advice for: compromise discovery, account recovery, access limitation, service restoration, and prevention. We find that the lack of advice prior work identified for websites from the U.S. also holds across continents, with the presence ranging from 37% to 77% on average. Additionally, we identified considerable differences when comparing countries and continents, with countries in Africa and Oceania significantly more affected by the lack of advice. To address this, we suggest providing publicly available and easyto-follow remediation advice for users and guidance for website providers so they can provide all the necessary information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.