Dimane Mpoeleng scite author profile

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

show abstract

A Survey On Missing Data in Machine Learning

Emmanuel

Maupong

Mpoeleng

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

show abstract

Human–computer interface design issues for a multi-cultural and multi-lingual English speaking country — Botswana

Onibere

Morgan

Busang

et al. 2001

Interacting with Computers

View full text Add to dashboard Cite

Review on methods used for wildlife species and individual identification

2021

View full text Add to dashboard Cite

Multi-greedy geographic packets forwarding using flow-based indicators

Oladeji-Atanda¹,

Mpoeleng²,

Ogwu³

2021

ICST Transactions on Mobile Communications and Applications

View full text Add to dashboard Cite

The MANET packet routing method of geographic greedy forwarding involves the selection of distance reducing intermediate relays towards a destination. The efficacy of the greedy methods differs and varies; nevertheless, the algorithms are similar and process the same data at a forwarding node. Their commonalities potentially allow the online assignment of different methods for more efficient progress forwarding in heterogeneous MANET environments. We define a multimethod multi-greedy packet forwarding approach in this paper. Using the IPFIX packet flow measures, we demonstrate the multigreedy scheme for the performance of repetitive packet routing tasks that permit exploration-exploitation application. The flows report reveal the optimal efficiency of each base greedy method in each flow which aggregates to the multi-greedy design. In comparison to the base methods, the case multi-greedy methods show considerable performance improvement in PDR, hop-count, and delay measures.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dimane Mpoeleng

A survey on missing data in machine learning

A Survey On Missing Data in Machine Learning

Human–computer interface design issues for a multi-cultural and multi-lingual English speaking country — Botswana

Review on methods used for wildlife species and individual identification

Multi-greedy geographic packets forwarding using flow-based indicators

Contact Info

Product

Resources

About