2019
DOI: 10.1088/1757-899x/523/1/012070
|View full text |Cite
|
Sign up to set email alerts
|

The relationship between data skewness and accuracy of Artificial Neural Network predictive model

Abstract: The purpose of this study is to investigate the relationship between data skewness in the output variable and the accuracy of artificial neural network predictive model. The artificial neural network predictive model is built using multilayer perceptron and consist of one output variable and six input variable, and the algorithm used is back propagation. Data used in this study is generated by conducting the simulations in 1000 cycles. Three categories of skewness used in the output variables are positive skew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…There were also 225 non-annotations, which were shuffled and split equally into three separate sets and added to the datasets of each annotator in order to learn the models to distinguish deontic and non-deontic sentences. Normal data distribution and neutral skewness (skewness between -0.05 and 0.05) were preferred as they generally lower the misclassification rate and bias of classifiers (Trafimow et al, 2018;Larasati et al, 2019;Liu et al, 2019). Since the datasets were highly skewed towards the 'Obligation' class (> 2 right-skewness), we undersampled (randomly) the obligations so that each dataset contained less than 100 instances of obligations, thus, reducing the skewness to around 1.…”
Section: Data Collection and Annotationmentioning
confidence: 99%
“…There were also 225 non-annotations, which were shuffled and split equally into three separate sets and added to the datasets of each annotator in order to learn the models to distinguish deontic and non-deontic sentences. Normal data distribution and neutral skewness (skewness between -0.05 and 0.05) were preferred as they generally lower the misclassification rate and bias of classifiers (Trafimow et al, 2018;Larasati et al, 2019;Liu et al, 2019). Since the datasets were highly skewed towards the 'Obligation' class (> 2 right-skewness), we undersampled (randomly) the obligations so that each dataset contained less than 100 instances of obligations, thus, reducing the skewness to around 1.…”
Section: Data Collection and Annotationmentioning
confidence: 99%
“…The quality of data is one important aspect for data scientists and statisticians, whereby they would aim to understand the distribution(s) present in the data to be able to apply appropriate measures and procedures for better interpretation of the results (Varshney, 2020). Whereas, the Shapiro Wilk normality test is one of the data normality test techniques (Malato, 2022;Royston, 1983;Royston, 1992;Yazici & Yolacan, 2007), herein we employed the quantilequantile (QQ) or simply quantile plots which aid in the visualization of the distributions available in the random variables by plotting these random variables on the y-axis, and the normal distribution on the x-axis, such that the plot between would a visualization of the present data distribution such that if the quantile points lies across the straight line y=x then it is a normal distribution, otherwise if the right side is above the y-x line and the left side is around the line, then it is right-skewed, likewise if the right side is around the line and the left is below then it is a left-skewed (Chan, 2022;Larasati et al, 2019;Varshney, 2020). This determination of which will aid in the requirement for the application of data normalization procedure before the effective application of consequent analytical and modeling techniques which work best at Gaussian distributions, and resultant models calibration, otherwise remedies such as data stratified sampling techniques could only aid if the issue was an imbalance type of concern.…”
Section: (Ii) Dataset Quality Testingmentioning
confidence: 99%