Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1387
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Detection of Vague Words and Sentences in Privacy Policies

Abstract: Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical stud… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(53 citation statements)
references
References 21 publications
(18 reference statements)
1
16
0
Order By: Relevance
“…We expected judging the vagueness on a sentence level to be an ambiguous task, which reflects in rather low reliability scores. The annotator reliability is comparable if not better to the one in [25], who report that 4/5 of their annotators agreed on 13% of the descriptions, which, in our case, is 13% for 3/3 annotators. In 47% of their cases, 3/5 annotator agreed, while in our case 2/3 annotators agreed on 70% of the cases.…”
Section: Annotationssupporting
confidence: 74%
See 1 more Smart Citation
“…We expected judging the vagueness on a sentence level to be an ambiguous task, which reflects in rather low reliability scores. The annotator reliability is comparable if not better to the one in [25], who report that 4/5 of their annotators agreed on 13% of the descriptions, which, in our case, is 13% for 3/3 annotators. In 47% of their cases, 3/5 annotator agreed, while in our case 2/3 annotators agreed on 70% of the cases.…”
Section: Annotationssupporting
confidence: 74%
“…As described in the Measures section, we enrich the usergenerated information need descriptions with a vagueness score. We followed the method used by Lebanoff and Liu [25] who use crowd sourcing with native English speakers to annotate the level of vagueness of a sentence. Four annotators were recruited on Amazon Mechanical Turk 3 to each label all user-generate descriptions on a scale from 1 ("very specific, not vague at all") to 10 ("very vague, not specific at all").…”
Section: Annotationsmentioning
confidence: 99%
“…This structured querying technique offers an advantage over approaches based on heuristics and keyword analysis (e.g., [3,6]); it allows us to better cover text with varying wordings but similar semantics. Further, this technique avoids the shortcomings of the other approaches that directly use machine learning to quantify the goals (e.g., [5,10,12]); it is more flexible for adapting the goals (i.e., queries) as needed, without having to create new labeling data for each new goal.…”
Section: Methodology Overviewmentioning
confidence: 99%
“…Passive Voice Index: gives the percentage of sentences that contain passive verb forms. To compute this score, we tokenize the text into sentences and perform dependency parsing on each sentence using the Spacy library 10 . We consider a sentence to contain a passive voice if it follows the pattern of: nsubjpass (that is Nominal subject (passive)), followed by aux (Auxiliary), and then followed by auxpass (Auxiliary (passive)).…”
Section: Text Metricsmentioning
confidence: 99%
“…We use the total score of each dimension in the answer as a feature. We also consider vagueness cue words (Bhatia et al, 2016;Lebanoff and Liu, 2018). This set of features (40 cue words) is represented by the frequency of the vagueness cues in the answer.…”
Section: Analyzing the Language Of Defencementioning
confidence: 99%