Filter List Generation for Underserved Regions

Sjösten, Alexander; Snyder, Peter; Pastor, Antonio; Papadopoulos, Panagiotis; Livshits, Benjamin

doi:10.1145/3366423.3380239

Cited by 21 publications

(11 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One line of prior work aims to develop ML models to automatically generate filter rules for blocking ads [11,36,74]. Bhagavatula et al [11] (BD+) trained supervised ML classifiers to detect advertising URLs.…”

Section: Background and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

AutoFR: Automated Filter Rule Generation for Adblocking

Le¹,

Elmalaki²,

Markopoulou³

et al. 2022

Preprint

View full text Add to dashboard Cite

Adblocking relies on filter lists, which are manually curated and maintained by a small community of filter list authors. This manual process is laborious and does not scale well to a large number of sites and over time. We introduce Aut-oFR, a reinforcement learning framework to fully automate the process of filter rule creation and evaluation. We design an algorithm based on multi-arm bandits to generate filter rules while controlling the trade-off between blocking ads and avoiding breakage. We test our implementation of AutoFR on thousands of sites in terms of efficiency and effectiveness. AutoFR is efficient: it takes only a few minutes to generate filter rules for a site. AutoFR is also effective: it generates filter rules that can block 86% of the ads, as compared to 87% by EasyList while achieving comparable visual breakage. The filter rules generated by AutoFR generalize well to new and unseen sites. We envision AutoFR to assist the adblocking community in automated filter rule generation at scale.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

“…The research community is actively working on machine learning (ML) approaches to assist with filter rule generation [11,36,74] or to build models to replace filter lists altogether [1,41,73,89]. There are two key limitations of prior ML-based approaches.…”

Section: Introductionmentioning

confidence: 99%

AutoFR: Automated Filter Rule Generation for Adblocking

Le¹,

Elmalaki²,

Markopoulou³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Graph-based approaches extract features from the crosslayer graph representation to train ML models to detect ads and trackers [18,19]. These approaches leverage rich crosslayer context and thus claim to be robust to evasion attempts.…”

Section: Background and Related Workmentioning

confidence: 99%

“…It extracts structural features from the graph such as node connectivity and ancestry information as well as content features such as URL length and presence/absence of certain keywords. Sjösten et al [19] introduced PageGraph, which extends ADGRAPH's graph representation by improving event attribution and capturing more behaviors. In addition to content and structural features, they also added perceptual features to train the classifier.…”

Section: Background and Related Workmentioning

confidence: 99%

“…However, these ML-based blocking approaches are highly susceptible to adversarial evasion techniques that are already found in the wild, including URL obfuscation [16] and code obfuscation [17]. To address this limitation, the next generation of ML-based blocking approaches leverage cross-layer graph information from multiple layers of the web stack [18,19]. These approaches claim better robustness to evasion, as compared to single-layer approaches, due to their use of structural features of the graph (i.e., the hierarchy of resource inclusions) in addition to traditional content features (i.e., the resource's network location or response content).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking

Siby,

Iqbal,

Englehardt

et al. 2021

Preprint

View full text Add to dashboard Cite

Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as ADGRAPH, are susceptible to adversarial evasions deployed in real-world. Second, we introduce WEBGRAPH, the first graph-based machine learning blocker that detects ads and trackers based on their action rather than their content. By building features around the actions that are fundamental to advertising and tracking -storing an identifier in the browser, or sharing an identifier with another tracker -WEBGRAPH performs nearly as well as prior approaches, but is significantly more robust to adversarial evasions. In particular, we show that WEBGRAPH achieves comparable accuracy to ADGRAPH, while significantly decreasing the success rate of an adversary from near-perfect under ADGRAPH to around 8% under WEBGRAPH. Finally, we show that WEBGRAPH remains robust to a more sophisticated adversary that uses evasion techniques beyond those currently deployed on the web.

show abstract

Detecting Web Tracking at the Network Layer

Wittig,

Kesdoğan

2024

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Filter List Generation for Underserved Regions

Cited by 21 publications

References 12 publications

AutoFR: Automated Filter Rule Generation for Adblocking

AutoFR: Automated Filter Rule Generation for Adblocking

WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking

Detecting Web Tracking at the Network Layer

Contact Info

Product

Resources

About