Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.
Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.
Honeypots are special tools designed to help track and understand attacker?s motives and their attack methods. In web applications, several honeypots have been developed and some have since been abandoned by their developers. But as honeypots are deployed more and more within computer networks, malicious attackers also devise techniques to detect and circumvent these security tools and thereby exposing limitations in most web application honeypots. Dynamic honeypots however, are believed to be the future of honeypots due to their abilities to adjust to the changing environments. Glastpof is one of the more popular if not the most, dynamic web application honeypot currently released to the public. But Glastopf has its limitations too. Once deployed, Glastopf can be easily identified by the attackers due to its performance and appearance, and as such become less useful to the security community. This research describes some of the limitations inherent in Glastopf, and then proposes possible ways to make it more deceptive and more attractive to attackers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.