Armin Sarabi scite author profile

Content filtering technologies are often used for Internet censorship, but even as these technologies have become cheaper and easier to deploy, the censorship measurement community lacks a systematic approach to monitor their proliferation. Past research has focused on a handful of specific filtering technologies, each of which required cumbersome manual detective work to identify. Researchers and policymakers require a more comprehensive picture of the state and evolution of censorship based on content filtering in order to establish effective policies that protect Internet freedom. In this work, we present FilterMap, a novel framework that can scalably monitor content filtering technologies based on their blockpages. FilterMap first compiles in-network and new remote censorship measurement techniques to gather blockpages from filter deployments. We then show how the observed blockpages can be clustered, generating signatures for longitudinal tracking. FilterMap outputs a map of regions of address space in which the same blockpages appear (corresponding to filter deployments), and each unique blockpage is manually verified to avoid false positives. By collecting and analyzing more than 379 million measurements from 45,000 vantage points against more than 18,000 sensitive test domains, we are able to identify filter deployments associated with 90 vendors and actors and observe filtering in 103 countries. We detect the use of commercial filtering technologies for censorship in 36 out of 48 countries labeled as 'Not Free' or 'Partly Free' by the Freedom House "Freedom on the Net" report [26]. The unrestricted transfer of content filtering technologies have led to high availability, low cost, and highly effective filtering techniques becoming easier to deploy and harder to circumvent. Identifying these filtering deployments highlights policy and corporate social responsibility issues, and adds accountability to filter manufacturers. Our continued publication of FilterMap data will help the international community track the scope, scale and evolution of content-based censorship.

Predicting Cyber Security Incidents Using Feature-Based Characterization of Network-Level Malicious Activities

Zhang

et al. 2015

This study offers a first step toward understanding the extent to which we may be able to predict cyber security incidents (which can be of one of many types) by applying machine learning techniques and using externally observed malicious activities associated with network entities, including spamming, phishing, and scanning, each of which may or may not have direct bearing on a specific attack mechanism or incident type. Our hypothesis is that when viewed collectively, malicious activities originating from a network are indicative of the general cleanness of a network and how well it is run, and that furthermore, collectively they exhibit fairly stable and thus predictive behavior over time. To test this hypothesis, we utilize two datasets in this study: (1) a collection of commonly used IP address-based/host reputation blacklists (RBLs) collected over more than a year, and (2) a set of security incident reports collected over roughly the same period. Specifically, we first aggregate the RBL data at a prefix level and then introduce a set of features that capture the dynamics of this aggregated temporal process. A comparison between the distribution of these feature values taken from the incident dataset and from the general population of prefixes shows distinct differences, suggesting their value in distinguishing between the two while also highlighting the importance of capturing dynamic behavior (second order statistics) in the malicious activities. These features are then used to train a support vector machine (SVM) for prediction. Our preliminary results show that we can achieve reasonably good prediction performance over a forecasting window of a few months.

Patch Me If You Can: A Study on the Effects of Individual User Behavior on the End-Host Vulnerability State

Zhu

Xiao

et al. 2017

Risky business: Fine-grained data breach prediction using business profiles

Sarabi¹,

Naghizadeh²,

Liu³

et al. 2016

J Cyber Secur

Smart Internet Probing: Scanning Using Adaptive Machine Learning

Jin

2021

Characterizing the Internet Host Population Using Deep Learning

2018

In this paper, we present a framework to characterize Internet hosts using deep learning, using Internet scan data to produce numerical and lightweight (low-dimensional) representations of hosts. To do so we first develop a novel method for extracting binary tags from structured texts, the format of the scan data. We then use a variational autoencoder, an unsupervised neural network model, to construct low-dimensional embeddings of our high-dimensional binary representations. We show that these lightweight embeddings retain most of the information in our binary representations, while drastically reducing memory and computational requirements for large-scale analysis. These embeddings are also universal, in that the process used to generate them is unsupervised and does not rely on specific applications. This universality makes the embeddings broadly applicable to a variety of learning tasks whereby they can be used as input features. We present two such examples, (1) detecting and predicting malicious hosts, and (2) unmasking hidden host attributes, and compare the trained models in their performance, speed, robustness, and interpretability. We show that our embeddings can achieve high accuracy (>95%) for these learning tasks, while being fast enough to enable host-level analysis at scale.

Deterrence, Backup, or Insurance: Game-Theoretic Modeling of Ransomware

Yin

2023

Games

In this paper, we present a game-theoretic analysis of ransomware. To this end, we provide theoretical and empirical analysis of a two-player Attacker-Defender (A-D) game, as well as a Defender-Insurer (D-I) game; in the latter, the attacker is assumed to be a non-strategic third party. Our model assumes that the defender can invest in two types of protection against ransomware attacks: (1) general protection through a deterrence effort, making attacks less likely to succeed, and (2) a backup effort serving the purpose of recourse, allowing the defender to recover from successful attacks. The attacker then decides on a ransom amount in the event of a successful attack, with the defender choosing to pay ransom immediately, or to try to recover their data first while bearing a recovery cost for this recovery attempt. Note that recovery is not guaranteed to be successful, which may eventually lead to the defender paying the demanded ransom. Our analysis of the A-D game shows that the equilibrium falls into one of three scenarios: (1) the defender will pay the ransom immediately without having invested any effort in backup, (2) the defender will pay the ransom while leveraging backups as a credible threat to force a lower ransom demand, and (3) the defender will try to recover data, only paying the ransom when recovery fails. We observe that the backup effort will be entirely abandoned when recovery is too expensive, leading to the (worst-case) first scenario which rules out recovery. Furthermore, our analysis of the D-I game suggests that the introduction of insurance leads to moral hazard as expected, with the defender reducing their efforts; less obvious is the interesting observation that this reduction is mostly in their backup effort.

Can Less Be More? A Game-Theoretic Analysis of Filtering vs. Investment

Naghizadeh

2014