Proceedings of the Web Conference 2020 2020
DOI: 10.1145/3366423.3380239
|View full text |Cite
|
Sign up to set email alerts
|

Filter List Generation for Underserved Regions

Abstract: Filter lists play a large and growing role in protecting and assisting web users. The vast majority of popular filter lists are crowd-sourced, where a large number of people manually label resources related to undesirable web resources (e.g. ads, trackers, paywall libraries), so that they can be blocked by browsers and extensions.Because only a small percentage of web users participate in the generation of filter lists, a crowd-sourcing strategy works well for blocking either uncommon resources that appear on … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(11 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…One line of prior work aims to develop ML models to automatically generate filter rules for blocking ads [11,36,74]. Bhagavatula et al [11] (BD+) trained supervised ML classifiers to detect advertising URLs.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One line of prior work aims to develop ML models to automatically generate filter rules for blocking ads [11,36,74]. Bhagavatula et al [11] (BD+) trained supervised ML classifiers to detect advertising URLs.…”
Section: Background and Related Workmentioning
confidence: 99%
“…The research community is actively working on machine learning (ML) approaches to assist with filter rule generation [11,36,74] or to build models to replace filter lists altogether [1,41,73,89]. There are two key limitations of prior ML-based approaches.…”
Section: Introductionmentioning
confidence: 99%
“…Graph-based approaches extract features from the crosslayer graph representation to train ML models to detect ads and trackers [18,19]. These approaches leverage rich crosslayer context and thus claim to be robust to evasion attempts.…”
Section: Background and Related Workmentioning
confidence: 99%
“…It extracts structural features from the graph such as node connectivity and ancestry information as well as content features such as URL length and presence/absence of certain keywords. Sjösten et al [19] introduced PageGraph, which extends ADGRAPH's graph representation by improving event attribution and capturing more behaviors. In addition to content and structural features, they also added perceptual features to train the classifier.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation