Paul K. Mvula scite author profile

Paul K. Mvula

4Publications

1Citation Statement Received

25Citation Statements Given

How they've been cited

How they cite others

200

Affiliations

University of Ottawa

Publications

Order By: Most citations

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

et al. 2023

View full text Add to dashboard Cite

In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.

show abstract

COVID-19 malicious domain names classification

Mvula

Branco

Jourdan

et al. 2022

Expert Systems with Applications

View full text Add to dashboard Cite

Evaluating Word Embedding Feature Extraction Techniques for Host-Based Intrusion Detection Systems

et al. 2023

View full text Add to dashboard Cite

Research into Intrusion and Anomaly Detectors at the Host level typically pays much attention to extracting attributes from system call traces. These include window-based, Hidden Markov Models, and sequence-model-based attributes. Recently, several works have been focusing on sequence-model-based feature extractors, specifically Word2Vec and GloVe, to extract embeddings from the system call traces due to their ability to capture semantic relationships among system calls. However, due to the nature of the data, these extractors introduce inconsistencies in the extracted features, causing the Machine Learning models built on them to yield inaccurate and potentially misleading results. In this paper, we first highlight the research challenges posed by these extractors. Then, we conduct experiments with new feature sets assessing their suitability to address the detected issues. Our experiments show that Word2Vec is prone to introducing more duplicated samples than GloVe. Regarding the solutions proposed, we found that concatenating the embedding vectors generated by Word2Vec and GloVe yields the overall best balanced accuracy. In addition to resolving the challenge of data leakage, this approach enables an improvement in performance relative to other alternatives.

show abstract

Measuring Improvement of F₁-Scores in Detection of Self-Admitted Technical Debt

Aiken

Mvula

Branco

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paul K. Mvula

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

COVID-19 malicious domain names classification

Evaluating Word Embedding Feature Extraction Techniques for Host-Based Intrusion Detection Systems

Measuring Improvement of F₁-Scores in Detection of Self-Admitted Technical Debt

Contact Info

Product

Resources

About

Paul K. Mvula

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

COVID-19 malicious domain names classification

Evaluating Word Embedding Feature Extraction Techniques for Host-Based Intrusion Detection Systems

Measuring Improvement of F1-Scores in Detection of Self-Admitted Technical Debt

Contact Info

Product

Resources

About

Measuring Improvement of F₁-Scores in Detection of Self-Admitted Technical Debt