Mobile applications frequently access sensitive personal information to meet user or business requirements. Because such information is sensitive in general, regulators increasingly require mobileapp developers to publish privacy policies that describe what information is collected. Furthermore, regulators have fined companies when these policies are inconsistent with the actual data practices of mobile apps. To help mobile-app developers check their privacy policies against their apps' code for consistency, we propose a semi-automated framework that consists of a policy terminology-API method map that links policy phrases to API methods that produce sensitive information, and information flow analysis to detect misalignments. We present an implementation of our framework based on a privacy-policy-phrase ontology and a collection of mappings from API methods to policy phrases. Our empirical evaluation on 477 top Android apps discovered 341 potential privacy policy violations.
Personal data is increasingly collected and used by companies to tailor services to users, and to make financial, employment, and health-related decisions about individuals. When personal data is inappropriately collected or misused, however, individuals may experience violations of their privacy. Historically, government regulators have relied on the concept of risk in energy, aviation and medicine, among other domains, to determine the extent to which products and services may harm the public. To address privacy concerns in government-controlled information technology, government agencies are advocating to adapt similar risk management frameworks to privacy. Despite the recent shift toward a risk-managed approach for privacy, to our knowledge, there are no empirical methods to determine which personal data are most at-risk and which contextual factors increase or decrease that risk. To this end, we introduce an empirical framework in this article that consists of factorial vignette surveys that can be used to measure the effect of different factors and their levels on privacy risk. We report a series of experiments to measure perceived privacy risk using the proposed framework, which are based on expressed preferences, and which we define as an individual's willingness to share their personal data with others given the likelihood of a potential privacy harm. These experiments control for one or more of the six factors affecting an individual's willingness to share their information: data type, computer type, data purpose, privacy harm, harm likelihood, and individual demographic factors, such as age range, gender, education level, ethnicity, and household income. To measure likelihood, we introduce and evaluate a new likelihood scale based on construal level theory in psychology. The scale frames individual attitudes about risk likelihood based on social and physical distance to the privacy harm. The findings include predictions about the extent to which the above factors correspond to risk acceptance, including that perceived risk is lower for induced disclosure harms when compared to surveillance and insecurity harms as defined in Solove's Taxonomy of Privacy. We also found that participants are more willing to share their information when they perceive the benefits of sharing. In addition, we found that likelihood was not a multiplicative factor in computing privacy risk perception, which challenges conventional theories of privacy risk in the privacy and security community.
Privacy policies describe high-level goals for corporate data practices; regulators require industries to make available conspicuous, accurate privacy policies to their customers. Consequently, software requirements must conform to those privacy policies. To help stakeholders extract privacy goals from policies, we introduce a semiautomated framework that combines crowdworker annotations, natural language typed dependency parses, and a reusable lexicon to improve goal-extraction coverage, precision, and recall. The framework evaluation consists of a five-policy corpus governing web and mobile information systems, yielding an average precision of 0.73 and recall of 0.83. The results show that no single framework element alone is sufficient to extract goals; however, the overall framework compensates for elemental limitations. Human annotators are highly adaptive at discovering annotations in new texts, but those annotations can be inconsistent and incomplete; dependency parsers lack sophisticated, tacit knowledge, but they can perform exhaustive text search for prospective requirements indicators; and while the lexicon may never completely saturate, the lexicon terms can be reliably used to improve recall. Lexical reuse reduces false negatives by 41%, increasing the average recall to 0.85. Last, crowd workers were able to identify and remove false positives by around 80%, which improves average precision to 0.93.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.