Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models

Bailly, Alexandre; Blanc, Coralie; Francis, Élie; Guillotin, Thierry; Jamal, F; Wakim, Béchara; Roy, Pascal

doi:10.1016/j.cmpb.2021.106504

Cited by 76 publications

(27 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The table shows steady improvement for most models with a larger set of training data over all metrics, except for the small model SqueezeNet. Generally, deep learning models, unlike traditional machine learning, benefit from larger datasets [44], which may be the reason for improved performance. The sample confusion matrix for DarkNet-53 in Figure 6 shows considerably better performance in terms of entries with one or fewer false misclassifications.…”

Section: Resultsmentioning

confidence: 99%

On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning

Fraiwan

Faouri

2022

Sensors

View full text Add to dashboard Cite

Skin cancer (melanoma and non-melanoma) is one of the most common cancer types and leads to hundreds of thousands of yearly deaths worldwide. It manifests itself through abnormal growth of skin cells. Early diagnosis drastically increases the chances of recovery. Moreover, it may render surgical, radiographic, or chemical therapies unnecessary or lessen their overall usage. Thus, healthcare costs can be reduced. The process of diagnosing skin cancer starts with dermoscopy, which inspects the general shape, size, and color characteristics of skin lesions, and suspected lesions undergo further sampling and lab tests for confirmation. Image-based diagnosis has undergone great advances recently due to the rise of deep learning artificial intelligence. The work in this paper examines the applicability of raw deep transfer learning in classifying images of skin lesions into seven possible categories. Using the HAM1000 dataset of dermoscopy images, a system that accepts these images as input without explicit feature extraction or preprocessing was developed using 13 deep transfer learning models. Extensive evaluation revealed the advantages and shortcomings of such a method. Although some cancer types were correctly classified with high accuracy, the imbalance of the dataset, the small number of images in some categories, and the large number of classes reduced the best overall accuracy to 82.9%.

show abstract

Section: Resultsmentioning

confidence: 99%

On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning

Fraiwan

Faouri

2022

Sensors

View full text Add to dashboard Cite

show abstract

“…The other tested models with more parameters performed worse, as they seemed overparameterized and likely learned aberrant features, thus overfitting to the training data. Not many studies explore this phenomenon in detail, but a similar phenomenon was noted in the results of a recent study of Bailly et al 64 studying the effects of dataset size, dataset complexity, and model complexity on performance.…”

Section: Deep-learning Model Architecturementioning

confidence: 62%

Detection and localization of hyperfunctioning parathyroid glands on [¹⁸F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts

Jarabek

Jamšek

Cuderman

et al. 2022

Radiology and Oncology

View full text Add to dashboard Cite

Background In the setting of primary hyperparathyroidism (PHPT), [18F]fluorocholine PET/CT (FCH-PET) has excellent diagnostic performance, with experienced practitioners achieving 97.7% accuracy in localising hyperfunctioning parathyroid tissue (HPTT). Due to the relative triviality of the task for human readers, we explored the performance of deep learning (DL) methods for HPTT detection and localisation on FCH-PET images in the setting of PHPT. Patients and methods We used a dataset of 93 subjects with PHPT imaged using FCH-PET, of which 74 subjects had visible HPTT while 19 controls had no visible HPTT on FCH-PET. A conventional Resnet10 as well as a novel mPETResnet10 DL model were trained and tested to detect (present, not present) and localise (upper left, lower left, upper right or lower right) HPTT. Our mPETResnet10 architecture also contained a region-of-interest masking algorithm that we evaluated qualitatively in order to try to explain the model’s decision process. Results The models detected the presence of HPTT with an accuracy of 83% and determined the quadrant of HPTT with an accuracy of 74%. The DL methods performed statistically worse (p < 0.001) in both tasks compared to human readers, who localise HPTT with the accuracy of 97.7%. The produced region-of-interest mask, while not showing a consistent added value in the qualitative evaluation of model’s decision process, had correctly identified the foreground PET signal. Conclusions Our experiment is the first reported use of DL analysis of FCH-PET in PHPT. We have shown that it is possible to utilize DL methods with FCH-PET to detect and localize HPTT. Given our small dataset of 93 subjects, results are nevertheless promising for further research.

show abstract

“…The output of logistic regression is always between (0 and 1), which is suitable for the binary classification task. The higher the value, the higher the probability that the current sample will be classified as class 1 and vice versa (Bailly et al., 2022; Ma et al., 2023; van den Goorbergh et al., 2022; Zabor et al., 2022).…”

Section: Methodsmentioning

confidence: 99%

Prediction of drinking water quality with machine learning models: A public health nursing approach

Özsezer,

Mermer

2023

Public Health Nursing

View full text Add to dashboard Cite

ObjectiveThe aim of this study is to use machine learning models to predict drinking water quality from a public health nursing approach.DesignMachine learning study.Sample“Water Quality Dataset” was used in the study. The dataset contains physical and chemical measurements of water quality for 2400 different water bodies. The process consists of four stages: Data processing with Synthetic Minority Oversampling Technique, hyperparameter tuning with 10‐fold cross‐validation, modeling and comparative analysis. 80% of the dataset is allocated as training data and 20% as test data. ML models logistic regression, K‐nearest neighbor, support vector machine, random forest, XGBoost, AdaBoost Classifier, Decision Tree algorithms were used for water quality prediction. Accuracy, precision, recall, F1 score and AUC performance metrics of ML models were compared. To evaluate the performance of the models, 10‐fold cross‐validation was used and a comparative analysis was performed. The p‐values of the models were also compared.ResultsN this study, where drinking water quality was predicted with seven different ML algorithms, it can be said that XGBoost and Random Forest are the best classification models in all performance metrics. There is a significant difference in all ML algorithms according to the p‐value. The H0 hypothesis is accepted for these algorithms. According to the H0 hypothesis, there is no difference between actual values and predicted values.ConclusionIn conclusion, the use of ML models in the prediction of drinking water quality can help nurses greatly improve access to clean water, a human right, be more knowledgeable about water quality, and protect the health of individuals.

show abstract

Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models

Cited by 76 publications

References 16 publications

On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning

On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning

Detection and localization of hyperfunctioning parathyroid glands on [¹⁸F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts

Prediction of drinking water quality with machine learning models: A public health nursing approach

Contact Info

Product

Resources

About

Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models

Cited by 76 publications

References 16 publications

On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning

On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning

Detection and localization of hyperfunctioning parathyroid glands on [18F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts

Prediction of drinking water quality with machine learning models: A public health nursing approach

Contact Info

Product

Resources

About

Detection and localization of hyperfunctioning parathyroid glands on [¹⁸F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts