Botnet detection systems struggle with performance and privacy issues when analyzing data from large-scale networks. Deep packet inspection, reverse engineering, clustering and other time consuming approaches are unfeasible for largescale networks. Therefore, many researchers focus on fast and simple botnet detection methods that use as little information as possible to avoid privacy violations. We present a novel technique for detecting malware using Domain Generation Algorithms (DGA), that is able to evaluate data from large scale networks without reverse engineering a binary or performing Non-Existent Domain (NXDomain) inspection. We propose to use a statistical approach and model the ratio of DNS requests and visited IPs for every host in the local network and label the deviations from this model as DGA-performing malware. We expect the malware to try to resolve more domains during a small time interval without a corresponding amount of newly visited IPs. For this we need only the NetFlow/IPFIX statistics collected from the network of interest. These can be generated by almost any modern router. We show that by using this approach we are able to identify DGAbased malware with zero to very few false positives. Because of the simplicity of our approach we can inspect data from very large networks with minimal computational costs.
Deception technologies, and honeypots in particular, have been used for decades to understand how cyber attacks and attackers work. A myriad of factors impact the effectiveness of a honeypot. However, very few is known about the impact of the geographical location of honeypots on the amount and type of attacks. Hornet 40 is the first dataset designed to help understand how the geolocation of honeypots may impact the inflow of network attacks. The data consists of network flows in binary and text format, with up to 118 features, including 480 bytes of the content of each flow. They were created using the Argus flow collector. The passive honeypots are IP addresses connected to the Internet and do not have any honeypot software running, so attacks are not interactive. The data was collected from identically configured honeypot servers in eight locations: Amsterdam, Bangalore, Frankfurt, London, New York, San Francisco, Singapore, and Toronto. The dataset contains over 4.7 million network flows collected during forty days throughout April, May, and June 2021.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.