Machine Learning Methods with Noisy, Incomplete or Small Datasets

Caiafa, César F.; Sun, Zhe; Tanaka, Toshihisa; Martí-Puig, Pere; Solé-Casals, Jordi

doi:10.3390/app11094132

Cited by 19 publications

(12 citation statements)

References 15 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The creating and assessing process of these models consists of feature selection, optimization of model parameters using the train data, and evaluation of the model using the test data, in the training and testing phases, respectively. Also, synthetic minority oversampling technique 23 is applied for balancing the train data. This process divides the data set into 10 nonoverlapping folds.…”

Section: Methodsmentioning

confidence: 99%

Prognosis of COVID‐19 patients using lab tests: A data mining approach

Khounraz

Khodadoost

Gholamzadeh

et al. 2023

Health Science Reports

View full text Add to dashboard Cite

Background The rapid prevalence of coronavirus disease 2019 (COVID‐19) has caused a pandemic worldwide and affected the lives of millions. The potential fatality of the disease has led to global public health concerns. Apart from clinical practice, artificial intelligence (AI) has provided a new model for the early diagnosis and prediction of disease based on machine learning (ML) algorithms. In this study, we aimed to make a prediction model for the prognosis of COVID‐19 patients using data mining techniques. Methods In this study, a data set was obtained from the intelligent management system repository of 19 hospitals at Shahid Beheshti University of Medical Sciences in Iran. All patients admitted had shown positive polymerase chain reaction (PCR) test results. They were hospitalized between February 19 and May 12 in 2020, which were investigated in this study. The extracted data set has 8621 data instances. The data include demographic information and results of 16 laboratory tests. In the first stage, preprocessing was performed on the data. Then, among 15 laboratory tests, four of them were selected. The models were created based on seven data mining algorithms, and finally, the performances of the models were compared with each other. Results Based on our results, the Random Forest (RF) and Gradient Boosted Trees models were known as the most efficient methods, with the highest accuracy percentage of 86.45% and 84.80%, respectively. In contrast, the Decision Tree exhibited the least accuracy (75.43%) among the seven models. Conclusion Data mining methods have the potential to be used for predicting outcomes of COVID‐19 patients with the use of lab tests and demographic features. After validating these methods, they could be implemented in clinical decision support systems for better management and providing care to severe COVID‐19 patients.

show abstract

Section: Methodsmentioning

confidence: 99%

Prognosis of COVID‐19 patients using lab tests: A data mining approach

Khounraz

Khodadoost

Gholamzadeh

et al. 2023

Health Science Reports

View full text Add to dashboard Cite

show abstract

“…Despite the growing pervasiveness level of big data, there are still challenges to accessing a high-quality training set. Data sharing agreements, violation of privacy [584], [585], noise problem [586], [587], poor data quality(fit for purpose) [588], imbalance of data [589], and lack of annotated datasets are number of challenges businesses face seeking raw data. Oversampling, undersampling, dynamic sampling [590] for imbalanced data, Surrogate Loss, Data Cleaning, finding distribution in solving the problem of learning from noisy labels for noisy data sets, and active learning [591] for lack of annotated data are a number of methods have been proposed to alleviate these problems.…”

Section: ) Scalabilitymentioning

confidence: 99%

Deep Representation Learning: Fundamentals, Technologies, Applications, and Open Challenges

Payandeh,

Baghaei,

Fayyazsanavi

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Machine learning algorithms have had a profound impact on the field of computer science over the past few decades. The performance of these algorithms heavily depends on the representations derived from the data during the learning process. Successful learning processes aim to produce concise, discrete, meaningful representations that can be effectively applied to various tasks. Recent advancements in deep learning models have proven to be highly effective in capturing high-dimensional, non-linear, and multi-modal characteristics. In this work, we provide a comprehensive overview of the current state-of-the-art in deep representation learning and the principles and developments made in the process of representation learning. Our study encompasses both supervised and unsupervised methods, including popular techniques such as autoencoders, self-supervised methods, and deep neural networks. Furthermore, we explore a wide range of applications, including image recognition and natural language processing. In addition, we discuss recent trends, key issues, and open challenges in the field. This survey endeavors to make a significant contribution to the field of deep representation learning, fostering its understanding and facilitating further advancements.

show abstract

“…A small dataset can sustain issues in supervised learning scenarios. This kind of dataset is termed a low-quality data problem [49]. Due to the lack of open and comprehensive solid waste generation data, we utilized the abovementioned data set.…”

Section: Solid Waste Datasetmentioning

confidence: 99%

An Ensemble Learning Based Classification Approach for the Prediction of Household Solid Waste Generation

Namoun

Tufail

Alrehaili

et al. 2022

Sensors

View full text Add to dashboard Cite

With the increase in urbanization and smart cities initiatives, the management of waste generation has become a fundamental task. Recent studies have started applying machine learning techniques to prognosticate solid waste generation to assist authorities in the efficient planning of waste management processes, including collection, sorting, disposal, and recycling. However, identifying the best machine learning model to predict solid waste generation is a challenging endeavor, especially in view of the limited datasets and lack of important predictive features. In this research, we developed an ensemble learning technique that combines the advantages of (1) a hyperparameter optimization and (2) a meta regressor model to accurately predict the weekly waste generation of households within urban cities. The hyperparameter optimization of the models is achieved using the Optuna algorithm, while the outputs of the optimized single machine learning models are used to train the meta linear regressor. The ensemble model consists of an optimized mixture of machine learning models with different learning strategies. The proposed ensemble method achieved an R2 score of 0.8 and a mean percentage error of 0.26, outperforming the existing state-of-the-art approaches, including SARIMA, NARX, LightGBM, KNN, SVR, ETS, RF, XGBoosting, and ANN, in predicting future waste generation. Not only did our model outperform the optimized single machine learning models, but it also surpassed the average ensemble results of the machine learning models. Our findings suggest that using the proposed ensemble learning technique, even in the case of a feature-limited dataset, can significantly boost the model performance in predicting future household waste generation compared to individual learners. Moreover, the practical implications for the research community and respective city authorities are discussed.

show abstract

Machine Learning Methods with Noisy, Incomplete or Small Datasets

Cited by 19 publications

References 15 publications

Prognosis of COVID‐19 patients using lab tests: A data mining approach

Prognosis of COVID‐19 patients using lab tests: A data mining approach

Deep Representation Learning: Fundamentals, Technologies, Applications, and Open Challenges

An Ensemble Learning Based Classification Approach for the Prediction of Household Solid Waste Generation

Contact Info

Product

Resources

About