2020
DOI: 10.48550/arxiv.2004.11131
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…The taxonomy covers privacy practices mentioned in the privacy policies of the websites. In the past few years, several works have trained classifiers using the taxonomy for automated policy analysis [22,36,43,45,47]. Researchers have also used automated policy analysis to check for consistency within the policy [5,6], as well as consistency with the code [51,52].…”
Section: Google Data Safety Sectionmentioning
confidence: 99%
See 1 more Smart Citation
“…The taxonomy covers privacy practices mentioned in the privacy policies of the websites. In the past few years, several works have trained classifiers using the taxonomy for automated policy analysis [22,36,43,45,47]. Researchers have also used automated policy analysis to check for consistency within the policy [5,6], as well as consistency with the code [51,52].…”
Section: Google Data Safety Sectionmentioning
confidence: 99%
“…Following prior works [43,45], we use DistilBERT [40] to train the classification models. We then use these models to extract privacy practices from policies.…”
Section: Privacy Policy Analysismentioning
confidence: 99%
“…The privacy policies for the PrivaSeer search engine come from the PrivaSeer Corpus 5 [21,22]. Srinath et al built the PrivaSeer Corpus using two separate crawls of the web.…”
Section: Data Collectionmentioning
confidence: 99%
“…The PrivaSeer Corpus was created by training a random forest model to classify whether a document is a privacy policy. Srinath et al [21] labeled 1000 crawled documents as either a privacy policy or not. We used the labeled data to train a machine learning model and obtained the probability of a document being a privacy policy.…”
Section: Rankingmentioning
confidence: 99%
See 1 more Smart Citation