Methods to Avoid Over-Fitting and Under-Fitting in Supervised Machine Learning (Comparative Study)

Jabbar, Haider Khalaf; Khan, Rafiqul Zaman

doi:10.3850/978-981-09-5247-1_017

Cited by 151 publications

(105 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ANNs face several issues that reduce their performance or contort results. Among them are overfitting (Allamy, 2014;Zhang et al, 2018) and underfitting (Allamy, 2014), data scarcity, the need for normalization, data imbalance and outlier influence (Khamis, Ismail, Khalid, & Tarmizi Mohammed, 2005). These issues were addressed using methods such as dropout (Park & Kwak, 2017), augmentation (jitter (pure Gaussian noise) and warp (Gaussian noise on Bezier-Curves))(Le Guennec, Malinowski, & Tavenard, 2016;Um et al, 2017;Velasco, Garnica, Lanchares, Botella, & Ignacio Hidalgo, 2018;Xiao & Xu, 2012), synthetic minority oversampling technique (SMOTE) (Fernández, García, Herrera, & Chawla, 2018), interquartile range (IQR) scaling (Mizera et al, 2004) and median absolute deviation (MAD) (Gorard, 2013) based Gaussian noise data completion.…”

Section: Annsmentioning

confidence: 99%

“…Over-and underfitting performance were originally considered as a measure to select the best-performing two ANN types, since over-and underfitted ANNs are not capable to generalize appropriately on new data. Such networks either emulate the testing data in an overly exact ragged fashion (overfitting) or fail to react to each type of new data (underfitting) (Allamy, 2014;Zhang et al, 2018). The selection process was planned to be carried out via the analysis of R 2 performance.…”

Section: Network Metricsmentioning

confidence: 99%

See 1 more Smart Citation

The effect of optimism bias and governmental action on siltation management within Japanese reservoirs surveyed via artificial neural network

et al. 2020

View full text Add to dashboard Cite

Reservoirs are installed as long-term assets to guarantee water and energy security for decades, if not centuries. However, the effect of siltation undermines reservoirs' sustainability because it significantly reduces the reservoirs' original capacity. The present paper attempts to evaluate the global reservoir siltation problem with the optimism bias theorem introduced by Kahneman and Tversky and applied to infrastructural mega-projects by Flyvbjerg and Ansar using artificial neural networks (ANNs) algorithms for large Japanese reservoirs. Japan possesses suitable long-term data and a legal directive concerning the sediment capacity siltation duration, which serves as a valid guide to check whether, over the past 100 years, engineers, planners and managers were capable of judging the sediment input correctly. Various ANN models were established to emulate Japanese reservoir siltation behavior. The networks demonstrate that reservoirs in Japan suffer from optimism bias. In contrast to the law, the dead storage volume of an average dam is supposed to reach capacity after 52 years. This finding joins the overall observation that mega-projects generally and globally suffer from optimism bias. The emulations were subsequently screened for a presumed influence of governance actions, namely, indicating plus monitoring and the change in the market competition situation. While reservoir siltation appears to continue regardless of the level of competition in public procurement, monitoring directives appear to have a considerable impact on improved siltation management, which demonstrates that dedicated governance action can significantly strengthen the sustainable behavior of key infrastructure elements such as reservoirs.

show abstract

Section: Annsmentioning

confidence: 99%

Section: Network Metricsmentioning

confidence: 99%

The effect of optimism bias and governmental action on siltation management within Japanese reservoirs surveyed via artificial neural network

et al. 2020

View full text Add to dashboard Cite

show abstract

“…If too few epochs are used, under-fitting can occur and the solutions can be of poor quality, where the model fits neither the training data nor the test data enough well. On the other hand, using too many epochs can cause over-fitting—the model fits the training data too well and thus fails to fit the test data enough well (does not have generalization capabilities), which prevents good performance on the test data [ 52 , 53 ]. When over-fitting occurs, the error on the training set continuously decreases with further model learning, and the error on the test set starts increasing.…”

Section: Multi-objective Evolutionary Instance Selection For Regrementioning

confidence: 99%

Multi-Objective Evolutionary Instance Selection for Regression Tasks

Kordos

Łapa

2018

Entropy

View full text Add to dashboard Cite

The purpose of instance selection is to reduce the data size while preserving as much useful information stored in the data as possible and detecting and removing the erroneous and redundant information. In this work, we analyze instance selection in regression tasks and apply the NSGA-II multi-objective evolutionary algorithm to direct the search for the optimal subset of the training dataset and the k-NN algorithm for evaluating the solutions during the selection process. A key advantage of the method is obtaining a pool of solutions situated on the Pareto front, where each of them is the best for certain RMSE-compression balance. We discuss different parameters of the process and their influence on the results and put special efforts to reducing the computational complexity of our approach. The experimental evaluation proves that the proposed method achieves good performance in terms of minimization of prediction error and minimization of dataset size.

show abstract

“…Figures 4(a) and 4(b) show some outlier values in two attributes “CLABSI: observed cases” and “patients who reported that their doctors sometimes or never communicated well,” respectively. DM models developed with outlier values yield very poor accuracy [59]. Therefore, we excluded hospitals that have outlier values from our experimental dataset, during data preparation processes ; this is done using the visual inspection method [60].…”

Section: Dm For a Clinical Surveillance Programmentioning

confidence: 99%

“…A brief description of these DM algorithms is explained as shown in Table 1. While developing our models, we took special care to avoid overfitting [59] of the models. It is important to note that the model building process was an iterative process.…”

Section: Dm For a Clinical Surveillance Programmentioning

confidence: 99%

Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

Noaman

Nadeem

Ragab

et al. 2017

BioMed Research International

View full text Add to dashboard Cite

Prediction of nosocomial infections among patients is an important part of clinical surveillance programs to enable the related personnel to take preventive actions in advance. Designing a clinical surveillance program with capability of predicting nosocomial infections is a challenging task due to several reasons, including high dimensionality of medical data, heterogenous data representation, and special knowledge required to extract patterns for prediction. In this paper, we present details of six data mining methods implemented using cross industry standard process for data mining to predict central line-associated blood stream infections. For our study, we selected datasets of healthcare-associated infections from US National Healthcare Safety Network and consumer survey data from Hospital Consumer Assessment of Healthcare Providers and Systems. Our experiments show that central line-associated blood stream infections (CLABSIs) can be successfully predicted using AdaBoost method with an accuracy up to 89.7%. This will help in implementing effective clinical surveillance programs for infection control, as well as improving the accuracy detection of CLABSIs. Also, this reduces patients' hospital stay cost and maintains patients' safety.

show abstract

Methods to Avoid Over-Fitting and Under-Fitting in Supervised Machine Learning (Comparative Study)

Cited by 151 publications

References 13 publications

The effect of optimism bias and governmental action on siltation management within Japanese reservoirs surveyed via artificial neural network

The effect of optimism bias and governmental action on siltation management within Japanese reservoirs surveyed via artificial neural network

Multi-Objective Evolutionary Instance Selection for Regression Tasks

Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

Contact Info

Product

Resources

About