Data Cleaning for Classification Using Misclassification Analysis

Akl

et al. 2020

Sensors

Predicting the results of soccer competitions and the contributions of match attributes, in particular, has gained popularity in recent years. Big data processing obtained from different sensors, cameras and analysis systems needs modern tools that can provide a deep understanding of the relationship between this huge amount of data produced by sensors and cameras, both linear and non-linear data. Using data mining tools does not appear sufficient to provide a deep understanding of the relationship between the match attributes and results and how to predict or optimize the results based upon performance variables. This study aimed to suggest a different approach to predict wins, losses and attributes’ sensitivities which enables the prediction of match results based on the most sensitive attributes that affect it as a second step. A radial basis function neural network model has successfully weighted the effectiveness of all match attributes and classified the team results into the target groups as a win or loss. The neural network model’s output demonstrated a correct percentage of win and loss of 83.3% and 72.7% respectively, with a low Root Mean Square training error of 2.9% and testing error of 0.37%. Out of 75 match attributes, 19 were identified as powerful predictors of success. The most powerful respectively were: the Total Team Medium Pass Attempted (MBA) 100%; the Distance Covered Team Average in zone 3 (15–20 km/h; Zone3_TA) 99%; the Team Average ball delivery into the attacking third of the field (TA_DAT) 80.9%; the Total Team Covered Distance without Ball Possession (Not in_Poss_TT) 76.8%; and the Average Distance Covered by Team (Game TA) 75.1%. Therefore, the novel radial based function neural network model can be employed by sports scientists to adapt training, tactics and opposition analysis to improve performance.

Section: The Ann Training and Testing Proceduresmentioning

confidence: 99%

Predicting Wins, Losses and Attributes’ Sensitivities in the Soccer World Cup 2018 Using Neural Network Analysis

Akl

et al. 2020

Sensors

“…For that, some of the data cleaning steps are applied. These steps are very important to have high-quality datasets because unclean data can decrease the classification or regression model accuracies [42]. Fig.…”

Section: B Data Preprocessingmentioning

confidence: 99%

Forecasting the Global Horizontal Irradiance based on Boruta Algorithm and Artificial Neural Networks using a Lower Cost

Alresheedi¹,

Abdullah²

2020

IJACSA

More solar-based electricity generation stations have been established markedly in recent years as new and an important source of renewable energy. That is to ensure a more efficient, reliable integration of solar power to overcome several challenges such as, the future forecasting, the costly equipment in the metrological stations. One of the effective prediction methods is Artificial Neural Networks (ANN) and the Boruta algorithm for optimal attributes selection, to train the proposed prediction model to obtain high accurate prediction performance at a lower cost. The precise goal of this research is to predict the Global Horizontal Irradiance (GHI) by building the ANN model. Also, reducing the total number of GHI prediction attributes/features consequently reducing the cost of devices and equipment required to predict this important factor. The dataset applied in this research is real data, collected from 2015-2018 by solar and meteorological stations in KSA. It provided by King Abdullah City for Atomic and Renewable Energy (KA CARE). The findings emphasize the achievement of accurate predictions of solar radiation with a minimum cost, which is considered to be highly important in KSA and all other countries that have a similar environment.

The 2012 International Joint Conference on Neural Networks (IJCNN)

“…In order to apply CMTNN to perform under-sampling [16], Truth NN and Falsity NN are employed to detect and remove misclassification patterns from a training set in the following steps:…”

Section: Target Outputsmentioning

confidence: 99%

Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm

Jeatrakul

Wong

2012

Self Cite

Abstract-In data classification, the problem of imbalanced class distribution has attracted many attentions. Most efforts have used to investigate the problem mainly for binary classification. However, research solutions for the imbalanced data on binaryclass problems are not directly applicable to multi-class applications. Therefore, it is a challenge to handle the multi-class problem with imbalanced data in order to obtain satisfactory results. This problem can indirectly affect how human visualise the data. In this paper, an algorithm named One-Against-All with Data Balancing (OAA-DB) is developed to enhance the classification performance in the case of the multi-class imbalanced data. This algorithm is developed by combining the multi-binary classification technique called One-Against-All (OAA) and a data balancing technique. In the experiment, the three multi-class imbalanced data sets used were obtained from the University of California Irvine (UCI) machine learning repository. The results show that the OAA-DB algorithm can enhance the classification performance for the multi-class imbalanced data without reducing the overall classification accuracy.