The numerical value of k in a k-fold cross-validation training technique of machine learning predictive models is an essential element that impacts the model’s performance. A right choice of k results in better accuracy, while a poorly chosen value for k might affect the model’s performance. In literature, the most commonly used values of k are five (5) or ten (10), as these two values are believed to give test error rate estimates that suffer neither from extremely high bias nor very high variance. However, there is no formal rule. To the best of our knowledge, few experimental studies attempted to investigate the effect of diverse k values in training different machine learning models. This paper empirically analyses the prevalence and effect of distinct k values (3, 5, 7, 10, 15 and 20) on the validation performance of four well-known machine learning algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) and K-Nearest Neighbours (KNN)). It was observed that the value of k and model validation performance differ from one machine-learning algorithm to another for the same classification task. However, our empirical suggest that k = 7 offers a slight increase in validations accuracy and area under the curve measure with lesser computational complexity than k = 10 across most MLA. We discuss in detail the study outcomes and outline some guidelines for beginners in the machine learning field in selecting the best k value and machine learning algorithm for a given task.
The availability of digital technology in the hands of every citizenry worldwide makes an available unprecedented massive amount of data. The capability to process these gigantic amounts of data in real-time with Big Data Analytics (BDA) tools and Machine Learning (ML) algorithms carries many paybacks. However, the high number of free BDA tools, platforms, and data mining tools makes it challenging to select the appropriate one for the right task. This paper presents a comprehensive mini-literature review of ML in BDA, using a keyword search; a total of 1512 published articles was identified. The articles were screened to 140 based on the study proposed novel taxonomy. The study outcome shows that deep neural networks (15%), support vector machines (15%), artificial neural networks (14%), decision trees (12%), and ensemble learning techniques (11%) are widely applied in BDA.The related applications fields, challenges, and most importantly the openings for future research, are detailed.
Distribution Transformer is a crucial element in deciding the power flow in large power systems. Their better performance implies high power system efficiency and enhanced power transfer capability. However, various Distribution Transformer failures in the recent past lead to power supply disturbance and have acquired much attention from the electrical intellectuals. It is of considerable significance to accurately get the running state of distribution transformers and timely detect the existence of potential transformer faults. This project work presents a predictive model to predict the potential of a distribution transformer failing before its expected years in service. Using Random Forest machine learning techniques, we examine transformer data from August 2010 to June 2019. Our experimental results reveal that a total of 90 distribution transformers were damaged within nine years. Thus, average the company losses ten (10) transformer in a year, which amount to the US $92300-95770 per year. Also, most of the places that recorded rate of distribution transformer damage were a location that had mini and major factories around. Thus, the Sunyani Municipality recorded the highest transformer damage (12), representing 13%, followed by Mim (10). Again, lighting strike was the significant causes of transformer damage. Thus twenty-one (21) out of the ninety (90) damage transformers was caused by a lightning strike. The results further show that 33.33% of the damage transformers were with 24.75-36.75% of their life expectancy. As low as 3.33% of the damage transformers have been in service for 73% of the life expectancy. From the study results, it can be concluded that a high percentage (68.9%) of the damage transformers in the Bono, Bono East and Ahafo regions of Ghana have been in service less the half of its expected years of service. Rate-offaulty-occurrence, Type-of-faults-sustained and Tap-changer-type are the most significant factors that determine the number of years left for a distribution transformer to fail. We observed that the make of a transformer was of less importance in predicting the years left for a transformer to fail. Finally, the RMSE of 0.001639 and MAPE error of 0.001321 achieved by the proposed model shows that the proposed model fits very well to the dataset.
Background: Most subscriber identification module (SIM) which usually finds their way to mobile phone users are primarily unregistered or pre-registered. Criminals buy these SIM cards, which have fake personal information, activate them and then use them as a channel of attacking vulnerable mobile phone users. Objective: to investigate the existing standards of the registration process, the weakness and how fraudsters leverage the shortcomings of the existing registration to attack unsuspecting subscribers. Methods: The study also proposed an automated theoretical model as an augmented model to ensure the SIM registration process and implementation become secure. Results: In our investigation, we identified that there had been a rise in fraudulent activities in Ghana, and the criminals have adapted to the new trend of committing a crime using mobile phones. The research presented a proposed conceptual model and algorithm for the new SIM registration. The study further conducted a comparative analysis of the principal component adopted to measure the robustness of the registration platform. The criminals mostly use social engineering tactics to trick their victims into disclosing sensitive information or sending money for services yet to be rendered. MNOs request an ID card before registering and activating SIMs, yet criminals can outwit the registration processes and get SIM cards registered through unapproved channels. Conclusion: We found out that the robustness of our model shall prevent SIM pre-registration and unapproved SIM activation due to verification mechanisms in the proposed model. A cognitive learning https://www.indjst.org/
The academic performance of students is essential for academic progression at all levels of education. However, the availability of several cognitive and non-cognitive factors that influence students’ academic performance makes it challenging for academic authorities to use conventional analytical tools to extract hidden knowledge in educational data. Therefore, Educational Data Mining (EDM) requires computational techniques to simplify planning and determining students who might be at risk of failing or dropping from school due to academic performance, thus helping resolve student retention. The paper studies several cognitive and non-cognitive factors such as academic, demographic, social and behavioural and their effect on student academic performance using machine learning algorithms. Heterogenous lazy and eager machine learning classifiers, including Decision Tree (DT), K-Nearest-Neighbour (KNN), Artificial Neural Network (ANN), Logistic Regression (LR), Random Forest (RF), AdaBoost and Support Vector Machine (SVM) were adopted and training was performed based on k-fold (k = 10) and leave-one-out cross-validation. We evaluated their predictive performance using well-known evaluation metrics like Area under Curve (AUC), F-1 score, Precision, Accuracy, Kappa, Matthew’s correlation coefficient (MCC) and Recall. The study outcome shows that Student Absence Days (SAD) are the most significant predictor of students’ academic performance. In terms of prediction accuracy and AUC, the RF (Acc = 0.771, AUC = 0.903), LR (Acc = 0.779, AUC = 0.90) and ANN (Acc = 0.760, AUC = 0.895) outperformed all other algorithms (KNN (Acc = 0.638, AUC = 0.826), SVM (Acc = 0.727, AUC = 0.80), DT (Acc = 0.733, AUC = 0.876) and AdaBoost (Acc = 0.748, AUC = 0.808)), making them more suitable for predicting students’ academic performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.