Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift

Barbero, Federico; Pendlebury, Feargus; Pierazzi, Fabio; Cavallaro, Lorenzo

doi:10.48550/arxiv.2010.03856

Cited by 2 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many learning-based systems in security are evaluated solely in laboratory settings, overstating their practical impact. A common example are detection methods evaluated only in a closed-world setting with limited diversity and no consideration of non-stationarity [15,70]. For example, a large number of website fingerprinting attacks are evaluated only in closed-world settings spanning a limited time period [71].…”

Section: % Presentmentioning

confidence: 99%

“…However, public datasets need to be treated with caution. Firstly, data ages and becomes less relevant in the fast-moving security landscape, partially due to concept drift [15,70,85,101]. Secondly, the characteristics of the data are increasingly exposed and thereby lead to implicit data snooping (P3) [see 1,88].…”

Section: Data Collection and Labelingmentioning

confidence: 99%

“…For example, temporal and spatial relations of the data should be considered to account for the typical dynamics encountered in the wild [see 101]. Similarly, runtime and storage constraints should be analyzed under practical conditions [see 15,107,126]. Ideally, the proposed system should be deployed to uncover problems that are not observable in a lab-only environment, such as the diversity and complexity of real-world network traffic [see 115].…”

Section: Deployment and Operationmentioning

confidence: 99%

See 2 more Smart Citations

Dos and Don'ts of Machine Learning in Computer Security

Arp¹,

Quiring²,

Pendlebury³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

With the growing processing power of computing systems and the increasing availability of massive datasets, machine learning algorithms have led to major breakthroughs in many different areas. This development has influenced computer security, spawning a series of work on learning-based security systems, such as for malware detection, vulnerability discovery, and binary code analysis. Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance and render learning-based systems potentially unsuitable for security tasks and practical deployment.In this paper, we look at this problem with critical eyes. First, we identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We conduct a longitudinal study of 30 papers from top-tier security conferences within the past 10 years, confirming that these pitfalls are widespread in the current security literature. In an empirical analysis, we further demonstrate how individual pitfalls can lead to unrealistic performance and interpretations, obstructing the understanding of the security problem at hand. As a remedy, we derive a list of actionable recommendations to support researchers and our community in avoiding pitfalls, promoting a sound design, development, evaluation, and deployment of learning-based systems for computer security.

show abstract

Section: % Presentmentioning

confidence: 99%

Section: Data Collection and Labelingmentioning

confidence: 99%

Section: Deployment and Operationmentioning

confidence: 99%

See 1 more Smart Citation

Dos and Don'ts of Machine Learning in Computer Security

Arp¹,

Quiring²,

Pendlebury³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, Jordaney et al [54] proposed the Transced framework to identify concept drift to establish prediction indicators. Barbero et al [55] based on the former framework for performing rejection classification, has improved efficiency and reduced computing expenses. For an ML-based classifier to be highly sustainable, it is critical to understand the underlying features: the ability to distinguish benign applications from malware and extract the changing pattern of those features through evolutionary processes [56].…”

mentioning

confidence: 99%

Early Detection of Abnormal Attacks in Software-Defined Networking Using Machine Learning Approaches

2022

View full text Add to dashboard Cite

Recent developments have made software-defined networking (SDN) a popular technology for solving the inherent problems of conventional distributed networks. The key benefit of SDN is the decoupling between the control plane and the data plane, which makes the network more flexible and easier to manage. SDN is a new generation network architecture; however, its configuration settings are centralized, making it vulnerable to hackers. Our study investigated the feasibility of applying artificial intelligence technology to detect abnormal attacks in an SDN environment based on the current unit network architecture; therefore, the concept of symmetry includes the sustainability of SDN applications and robust performance of machine learning (ML) models in the case of various malicious attacks. In this study, we focus on the early detection of abnormal attacks in an SDN environment. On detection of malicious traffic in SDN topology, the AI module in the topology is applied to detect and act against the attack source through machine learning algorithms, making the network architecture more flexible. Under multiple abnormal attacks, we propose a hierarchical multi-class (HMC) architecture to effectively address the imbalanced dataset problem and improve the performance of minority classes. The experimental results show that the decision tree, random forest, bagging, AdaBoost, and deep learning models exhibit the best performance for distributed denial-of-service (DDoS) attacks. In addition, for the imbalanced dataset problem of multiclass classification, our proposed HMC architecture performs better than previous single classifiers. We also simulated the SDN topology and scenario verification. In summary, we concatenated the AI module to enhance the security and effectiveness of SDN networks in a practical manner.

show abstract

Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift

Cited by 2 publications

References 18 publications

Dos and Don'ts of Machine Learning in Computer Security

Dos and Don'ts of Machine Learning in Computer Security

Early Detection of Abnormal Attacks in Software-Defined Networking Using Machine Learning Approaches

Contact Info

Product

Resources

About