2021
DOI: 10.2478/popets-2021-0081
|View full text |Cite
|
Sign up to set email alerts
|

Unifying Privacy Policy Detection

Abstract: Privacy policies have become a focal point of privacy research. With their goal to reflect the privacy practices of a website, service, or app, they are often the starting point for researchers who analyze the accuracy of claimed data practices, user understanding of practices, or control mechanisms for users. Due to vast differences in structure, presentation, and content, it is often challenging to extract privacy policies from online resources like websites for analysis. In the past, researchers have relied… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 57 publications
0
4
0
Order By: Relevance
“…For all corpora, we applied the best practices for privacy policy preprocessing identified in our earlier work [34]. Following these, we used the Boilerpipe text extractor with the NumWordsRules-Extractor setting [38] to obtain the plain text of privacy policies from web pages, determined the languages of the texts by applying a majority voting scheme on the results of multiple language detection libraries, and identified non-privacy policies by applying trained classifiers [34] that achieved F1 scores of 99.1 % and 99.8 % for English and German. For each data collection time point, we manually inspected a random sample of 10 % of the downloaded policies and the final output for correctness and found no issues.…”
Section: Text Preprocessingmentioning
confidence: 99%
“…For all corpora, we applied the best practices for privacy policy preprocessing identified in our earlier work [34]. Following these, we used the Boilerpipe text extractor with the NumWordsRules-Extractor setting [38] to obtain the plain text of privacy policies from web pages, determined the languages of the texts by applying a majority voting scheme on the results of multiple language detection libraries, and identified non-privacy policies by applying trained classifiers [34] that achieved F1 scores of 99.1 % and 99.8 % for English and German. For each data collection time point, we manually inspected a random sample of 10 % of the downloaded policies and the final output for correctness and found no issues.…”
Section: Text Preprocessingmentioning
confidence: 99%
“…Ambiguity is further introduced by the need to have complete information in these policies [13,40]. In addition, many users need help retrieving specific information from a policy [28], and sometimes locate where the policy is mentioned in a website [30,32,34]. Most of these prior works point to critical challenges of privacy policies; however, they focused on analyzing a single snapshot of the policies.…”
Section: Related Work 21 Privacy Policy Challengesmentioning
confidence: 99%
“…A growing body of literature has tackled the problem to automatically find privacy policies on websites and download them for further analysis. Hosseini et al [28] discuss and evaluate different approaches and identify best practices. Recent research has also shown growing interest in (cookie) consent notices.…”
Section: Related Workmentioning
confidence: 99%
“…Thus, any website collecting such information must have a privacy policy that explains the use of visitors' personal data. To determine if a website had a privacy policy, we followed best practices identified by Hosseini et al [28] and searched for privacy policy specific words in and around HTML link tags. For this, we extended a list of common words for privacy policy links, terms-of-service, and contact pages from a recent study [55] to cover all official EU languages.…”
Section: No Privacy Policymentioning
confidence: 99%