Knowledge transfer impacts the performance of deep learning -the state of the art for image classification tasks, including automated melanoma screening. Deep learning's greed for large amounts of training data poses a challenge for medical tasks, which we can alleviate by recycling knowledge from models trained on different tasks, in a scheme called transfer learning. Although much of the best art on automated melanoma screening employs some form of transfer learning, a systematic evaluation was missing. Here we investigate the presence of transfer, from which task the transfer is sourced, and the application of fine tuning (i.e., retraining of the deep learning model after transfer). We also test the impact of picking deeper (and more expensive) models. Our results favor deeper models, pretrained over ImageNet, with fine-tuning, reaching an AUC of 80.7% and 84.5% for the two skin-lesion datasets evaluated.
Deep learning fostered a leap ahead in automated skin lesion analysis in the last two years. Those models, however, are expensive to train and difficult to parameterize. Objective: We investigate methodological issues for designing and evaluating deep learning models for skin lesion analysis. We explore ten choices faced by researchers: use of transfer learning, model architecture, train dataset, image resolution, type of data augmentation, input normalization, use of segmentation, duration of training, additional use of Support Vector Machines, and test data augmentation. Methods: We perform two full factorial experiments, for five different test datasets, resulting in 2560 exhaustive trials in our main experiment, and 1280 trials in our assessment of transfer learning. We analyze both with multi-way analyses of variance (ANOVA). We use the exhaustive trials to simulate sequential decisions and ensembles, with and without the use of privileged information from the test set. Results main experiment: Amount of train data has disproportionate influence, explaining almost half the variation in performance. Of the other factors, test data augmentation and input resolution are the most influential. Deeper models, when combined, with extra data, also help. -transfer experiment: Transfer learning is critical, its absence brings huge performance penalties.simulations: Ensembles of models are the best option to provide reliable results with limited resources, without using privileged information and sacrificing methodological rigor. Conclusions and Significance: Advancing research on automated skin lesion analysis requires curating larger public datasets. Indirect use of privileged information from the test set to design the models is a subtle, but frequent methodological mistake that leads to overoptimistic results. Ensembles of models are a cost-effective alternative to the expensive full-factorial and to the unstable sequential designs.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.