Background: Artificial intelligence can be trained to outperform dermatologists in image-based skin cancer diagnostics. However, the networks' sensitivity to biases and overfitting may hamper their clinical applicability. Objectives:The aim of this study was to explain the potential consequences of implementing convolutional neural networks for stand-alone melanoma diagnostics and skin lesion triage. Methods:In this algorithm validation study on retrospective data, we reproduced and evaluated the performance of state-of-the-art artificial intelligence (convolutional neural networks) for skin cancer diagnostics. The networks were trained on 25,331 annotated dermoscopic skin lesion images from an open-source data set (ISIC-2019) and tested using a novel data set (AISC-2021) consisting of 26,591 annotated dermoscopic skin lesion images. We tested the trained algorithms' ability to generalize to new data and their diagnostic performance in two simulations (melanoma diagnostics and skin lesion triage). Results: The trained algorithms performed significantly less accurate diagnostics on images of nevi, melanomas and actinic keratoses from the AISC-2021 data set than the ISIC-2019 data set (p < 0.003). Almost one-third (31.1%) of the melanomas were misclassified during the melanoma diagnostics simulation, irrespective of their Breslow thickness. Furthermore, the algorithms marked 92.7% of the lesions 'suspicious' during the triage simulation, which yielded a triage sensitivity and specificity of 99.7% and 8.2%, respectively. Conclusions: Although state-of-the-art artificial intelligence outperforms dermatologists on image-based skin lesion classification within an artificial
Background:Differentiating between benign and malignant skin lesions can be very difficult and should only be done by sufficiently trained and skilled clinicians. To our knowledge there are no validated tests for reliable assessments of clinicians' ability to perform skin cancer diagnostics. Objective:To develop and gather validity evidence for a test in skin cancer diagnostics. Methods:A multiple-choice questionnaire (MCQ) was developed based on informal interviews with seven content experts from five skin cancer centers in Denmark. Validity evidence for the test was gathered from May until July 2019 using Messick's validity framework (content, response process, internal structure, relationship to other variables and consequences). Item content was revised through a Delphi-like review process and then piloted on 36 medical students and 136 doctors using a standardized response process. Results enabled an analysis of the internal structure and relationship to other variables of the test. Finally, the contrasting groups method was used to investigate the test's consequences (pass-fail standard). Results:The initial 90-item MCQ was reduced to 40 items during the Delphi-like review process. Item analysis revealed that 25 of the 40 selected items were level I-III quality items with a high internal consistency (Cronbach's alpha = 0.83) and highly significant (P≤0.0001) differences in test scores between participants with different occupations or levels of experience. A pass-fail standard of 12 (48%) correct answers was established using the contrasting groups' method. Conclusion:The skin cancer diagnostics MCQ developed in this study can be used for reliable assessments of clinicians' competencies.
When doctors are trained to diagnose a specific disease, they learn faster when presented with cases in order of increasing difficulty. This creates the need for automatically estimating how difficult it is for doctors to classify a given case. In this paper, we introduce methods for estimating how hard it is for a doctor to diagnose a case represented by a medical image, both when ground truth difficulties are available for training, and when they are not. Our methods are based on embeddings obtained with deep metric learning. Additionally, we introduce a practical method for obtaining ground truth human difficulty for each image case in a dataset using self-assessed certainty. We apply our methods to two different medical datasets, achieving high Kendall rank correlation coefficients on both, showing that we outperform existing methods by a large margin on our problem and data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.