Automatic Detection of Vague Words and Sentences in Privacy Policies

Lebanoff, Logan; Liu, Fei

doi:10.18653/v1/d18-1387

Cited by 25 publications

(53 citation statements)

References 21 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We expected judging the vagueness on a sentence level to be an ambiguous task, which reflects in rather low reliability scores. The annotator reliability is comparable if not better to the one in [25], who report that 4/5 of their annotators agreed on 13% of the descriptions, which, in our case, is 13% for 3/3 annotators. In 47% of their cases, 3/5 annotator agreed, while in our case 2/3 annotators agreed on 70% of the cases.…”

Section: Annotationssupporting

confidence: 74%

“…As described in the Measures section, we enrich the usergenerated information need descriptions with a vagueness score. We followed the method used by Lebanoff and Liu [25] who use crowd sourcing with native English speakers to annotate the level of vagueness of a sentence. Four annotators were recruited on Amazon Mechanical Turk 3 to each label all user-generate descriptions on a scale from 1 ("very specific, not vague at all") to 10 ("very vague, not specific at all").…”

Section: Annotationsmentioning

confidence: 99%

See 1 more Smart Citation

'A Modern Up-To-Date Laptop' - Vagueness in Natural Language Queries for Product Search

Papenmeier

Sliwa

Kern

et al. 2020

Proceedings of the 2020 ACM Designing Interactive Systems Conference

View full text Add to dashboard Cite

With the rise of voice assistants and an increase in mobile search usage, natural language has become an important query language. So far, most of the current systems are not able to process these queries because of the vagueness and ambiguity in natural language. Users have adapted their query formulation to what they think the search engine is capable of, which adds to their cognitive burden. With our research, we contribute to the design of interactive search systems by investigating the genuine information need in a product search scenario. In a crowd-sourcing experiment, we collected 132 information needs in natural language. We examine the vagueness of the formulations and their match to retailer-generated content and user-generated product reviews. Our findings reveal high variance on the level of vagueness and the potential of user reviews as a source for supporting users with rather vague search intents.

show abstract

Section: Annotationssupporting

confidence: 74%

Section: Annotationsmentioning

confidence: 99%

'A Modern Up-To-Date Laptop' - Vagueness in Natural Language Queries for Product Search

Papenmeier

Sliwa

Kern

et al. 2020

Proceedings of the 2020 ACM Designing Interactive Systems Conference

View full text Add to dashboard Cite

show abstract

“…This structured querying technique offers an advantage over approaches based on heuristics and keyword analysis (e.g., [3,6]); it allows us to better cover text with varying wordings but similar semantics. Further, this technique avoids the shortcomings of the other approaches that directly use machine learning to quantify the goals (e.g., [5,10,12]); it is more flexible for adapting the goals (i.e., queries) as needed, without having to create new labeling data for each new goal.…”

Section: Methodology Overviewmentioning

confidence: 99%

“…Passive Voice Index: gives the percentage of sentences that contain passive verb forms. To compute this score, we tokenize the text into sentences and perform dependency parsing on each sentence using the Spacy library 10 . We consider a sentence to contain a passive voice if it follows the pattern of: nsubjpass (that is Nominal subject (passive)), followed by aux (Auxiliary), and then followed by auxpass (Auxiliary (passive)).…”

Section: Text Metricsmentioning

confidence: 99%

The Privacy Policy Landscape After the GDPR

Lindén¹,

Khandelwal²,

Harkous

et al. 2020

Proceedings on Privacy Enhancing Technologies

114

View full text Add to dashboard Cite

The EU General Data Protection Regulation (GDPR) is one of the most demanding and comprehensive privacy regulations of all time. A year after it went into effect, we study its impact on the landscape of privacy policies online. We conduct the first longitudinal, in-depth, and at-scale assessment of privacy policies before and after the GDPR. We gauge the complete consumption cycle of these policies, from the first user impressions until the compliance assessment. We create a diverse corpus of two sets of 6,278 unique English-language privacy policies from inside and outside the EU, covering their pre-GDPR and the post-GDPR versions. The results of our tests and analyses suggest that the GDPR has been a catalyst for a major overhaul of the privacy policies inside and outside the EU. This overhaul of the policies, manifesting in extensive textual changes, especially for the EU-based websites, comes at mixed benefits to the users. While the privacy policies have become considerably longer, our user study with 470 participants on Amazon MTurk indicates a significant improvement in the visual representation of privacy policies from the users' perspective for the EU websites. We further develop a new workflow for the automated assessment of requirements in privacy policies. Using this workflow, we show that privacy policies cover more data practices and are more consistent with seven compliance requirements post the GDPR. We also assess how transparent the organizations are with their privacy practices by performing specificity analysis. In this analysis, we find evidence for positive changes triggered by the GDPR, with the specificity level improving on average. Still, we find the landscape of privacy policies to be in a transitional phase; many policies still do not meet several key GDPR requirements or their improved coverage comes with reduced specificity.

show abstract

“…We use the total score of each dimension in the answer as a feature. We also consider vagueness cue words (Bhatia et al, 2016;Lebanoff and Liu, 2018). This set of features (40 cue words) is represented by the frequency of the vagueness cues in the answer.…”

Section: Analyzing the Language Of Defencementioning

confidence: 99%

Using context to identify the language of face-saving

Naderi

Hirst

2018

Proceedings of the 5th Workshop on Argument Mining

View full text Add to dashboard Cite

We created a corpus of utterances that attempt to save face from parliamentary debates and use it to automatically analyze the language of reputation defence. Our proposed model that incorporates information regarding threats to reputation can predict reputation defence language with high confidence. Further experiments and evaluations on different datasets show that the model is able to generalize to new utterances and can predict the language of reputation defence in a new dataset.

show abstract

Automatic Detection of Vague Words and Sentences in Privacy Policies

Cited by 25 publications

References 21 publications

'A Modern Up-To-Date Laptop' - Vagueness in Natural Language Queries for Product Search

'A Modern Up-To-Date Laptop' - Vagueness in Natural Language Queries for Product Search

The Privacy Policy Landscape After the GDPR

Using context to identify the language of face-saving

Contact Info

Product

Resources

About