2023
DOI: 10.1007/s44248-023-00003-x
|View full text |Cite
|
Sign up to set email alerts
|

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

Abstract: In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be care… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 122 publications
0
1
0
Order By: Relevance
“…The primary motivation for choosing semi-supervised learning is that labeling the data is time-consuming and/or expensive, whereas obtaining unlabeled data requires little time or expense. Semi-supervised learning is widely used in multiple research fields, including the medical field, where it is applied, for example, in oncology diagnostics and care [22,23]. The classification of histopathology and radiotherapy images is essential for various kinds of cancers, such as breast, lung, gastric, liver, colorectal, kidney, pancreatic, and uterine cancers [24][25][26][27][28][29], but adequate labeling by experts is often time-consuming and thus cost-ineffective.…”
Section: Discussionmentioning
confidence: 99%
“…The primary motivation for choosing semi-supervised learning is that labeling the data is time-consuming and/or expensive, whereas obtaining unlabeled data requires little time or expense. Semi-supervised learning is widely used in multiple research fields, including the medical field, where it is applied, for example, in oncology diagnostics and care [22,23]. The classification of histopathology and radiotherapy images is essential for various kinds of cancers, such as breast, lung, gastric, liver, colorectal, kidney, pancreatic, and uterine cancers [24][25][26][27][28][29], but adequate labeling by experts is often time-consuming and thus cost-ineffective.…”
Section: Discussionmentioning
confidence: 99%
“…SSL is a specific type of ML that requires only a limited number of labels or may work with partially labeled data to build these models. It is crucial that the datasets used in developing these models for cybersecurity accurately represent real-world data to ensure their effectiveness and relevance [ 34 ]. As a fundamental framework for understanding the links and interdependencies among various cybersecurity components from a complete cybersecurity perspective, Grobler et al offered the 3U's of cybersecurity: user, usage, and usability.…”
Section: Reviewmentioning
confidence: 99%