2015
DOI: 10.15282/ijsecs.1.2015.6.0006
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique

Abstract: Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasets that can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe a data set as large in size, perhaps, is circumstance dependent, thus, what constitutes a dataset to be considered as being big or small is vague. In this paper, the ability of the predictive model to generali… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
36
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 56 publications
(39 citation statements)
references
References 13 publications
2
36
0
1
Order By: Relevance
“…On the whole, the efficiency of the prediction process was found to be approximately 96.365%, indicating that there is a slight deviation between the predictions and the actual values ( Fig 5 ). This can possibly be rectified by increasing the training sample size as generally, it is inferred that using sufficient data set for predictive model construction can lead to better accuracy [ 66 ].…”
Section: Resultsmentioning
confidence: 99%
“…On the whole, the efficiency of the prediction process was found to be approximately 96.365%, indicating that there is a slight deviation between the predictions and the actual values ( Fig 5 ). This can possibly be rectified by increasing the training sample size as generally, it is inferred that using sufficient data set for predictive model construction can lead to better accuracy [ 66 ].…”
Section: Resultsmentioning
confidence: 99%
“…Dataset size is also known to affect the performance of machine learning models (e.g., Ajiboye et al, 2015; Raudys & Jain, 1991). However, the error rates of machine learning models have also been shown to decrease for sample sizes of > 100 observations (Ajiboye et al, 2015). Similar analyses applied using a range of datasets show that accurate results can be achieved for groups with small sample sizes (Bland et al., 2015).…”
Section: Discussionmentioning
confidence: 99%
“…We would like to note that the results shown in Section 4.2 are based on the test dataset and showed improved results by significantly reducing random and systematic errors, which indicates that our model is successfully calibrated and could potentially be useful to predict the independent hydrometeorological dataset. The machine learning-based error models can manipulate the training data in such a way that the actual results expected from the untrained dataset can be quite different from the evaluated results using the training dataset [51,52,94]. Therefore, we considered the representation of extreme (>95th and <25th) precipitation values in the training and testing dataset to make sure that it covered the entire range of the dataset.…”
Section: Discussionmentioning
confidence: 99%