Proceedings of the Internet Measurement Conference 2018 2018
DOI: 10.1145/3278532.3278545
|View full text |Cite
|
Sign up to set email alerts
|

Characterizing the Internet Host Population Using Deep Learning

Abstract: In this paper, we present a framework to characterize Internet hosts using deep learning, using Internet scan data to produce numerical and lightweight (low-dimensional) representations of hosts. To do so we first develop a novel method for extracting binary tags from structured texts, the format of the scan data. We then use a variational autoencoder, an unsupervised neural network model, to construct low-dimensional embeddings of our high-dimensional binary representations. We show that these lightweight emb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…The network behavior analysis provides critical insights into traffic characterization and classifications for network applications [46], end systems [47,48], and Internet users [49][50][51]. The rich set of multidimensional and multi-layer traffic features from network behavior analysis not only characterizes traffic patterns of the Internet "objects", i.e., Internet applications, network systems, and end users but also enables accurate classifications and detection on unknown or anomalous network traffic [52].…”
Section: Benefits Of Network Behavior Analysismentioning
confidence: 99%
“…The network behavior analysis provides critical insights into traffic characterization and classifications for network applications [46], end systems [47,48], and Internet users [49][50][51]. The rich set of multidimensional and multi-layer traffic features from network behavior analysis not only characterizes traffic patterns of the Internet "objects", i.e., Internet applications, network systems, and end users but also enables accurate classifications and detection on unknown or anomalous network traffic [52].…”
Section: Benefits Of Network Behavior Analysismentioning
confidence: 99%
“…Iglesias and Szeby [20] have shown how to cluster IBR data from Darknet based on a novel representation of network traffic to identify network traffic patterns that are characteristic of activities such as long term scanning, as well as bursty events from targeted attacks and short term incidents. Finally, Sarabi and Liu [39] employ deep learning for obtaining lightweight embeddings to characterize the population of Internet hosts as observed by scanning services such as Censys.io.…”
Section: Related Workmentioning
confidence: 99%
“…Motivated by the recent success of deep representation learning, we employ the idea of autoencoders [27,39,44] to learn low-dimensional numerical embeddings of the input data. The resulting heterogeneity of the input data features, their high dimensionality, and the need to cope with potentially non-linear interactions between features motivated us to employ deep representation learning to address these challenges.…”
Section: Representation Learningmentioning
confidence: 99%
“…Nonetheless, the platform has been extensively used by researchers to either label existing data or collect datasets for training and evaluation of algorithms. While past focus has been on both files [27][28][29][30][31][32][33][34][35][36] as well as suspicious IP addresses and URLs [37][38][39][40][41][42][43][44][45][46][47][48], the work herein aims to cluster and understand the dynamics of submitted URLs within the VirusTotal platform. That is, we are primarily concerned about characterising the URLs themselves via the metadata available for each submission, as opposed to the raw content that they point to.…”
Section: Virustotal Platformmentioning
confidence: 99%