Artificial intelligence is becoming increasingly important in dermatology, with studies reporting accuracy matching or exceeding dermatologists for the diagnosis of skin lesions from clinical and dermoscopic images. However, real-world clinical validation is currently lacking. We review dermatological applications of deep learning, the leading artificial intelligence technology for image analysis, and discuss its current capabilities, potential failure modes, and challenges surrounding performance assessment and interpretability. We address the following three primary applications: (i) teledermatology, including triage for referral to dermatologists; (ii) augmenting clinical assessment during face-to-face visits; and (iii) dermatopathology. We discuss equity and ethical issues related to future clinical adoption and recommend specific standardization of metrics for reporting model performance.
Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.
Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods. 1
Atopic dermatitis (AD) is a chronic inflammatory skin disease. Many patients with AD seek care from both primary care physicians and dermatologists. However, little is known regarding topical corticosteroid prescribing patterns between these two specialties. We sought to determine if differences exist in the topical corticosteroid (TCS) prescribing patterns between dermatologists, family medicine physicians, and internal medicine physicians. We conducted a population-based, cross-sectional analysis using data from the National Ambulatory Medical Care Survey (NAMCS) from 2006 to 2016. There were 5,071,158 (weighted) outpatient AD visits between 2006 and 2016 for adults who were seen by physicians from family medicine, internal medicine, and dermatology. There was not a statistically significant difference in the rate of TCS prescriptions for AD between family medicine physicians and dermatologists (39.1% vs. 52.2%; p¼0.27). However, family medicine physicians had a higher rate of prescribing TCS for AD than internal medicine physicians (39.1% vs. 5.1%; p¼0.002). Dermatologists had a significantly higher rate of prescribing TCS for AD compared to internal medicine physicians (52% versus 5%; p<0.001). Our findings demonstrate that dermatologists prescribe topical corticosteroids for atopic dermatitis more frequently compared to internal medicine physicians but not in comparison to family medicine physicians. It is important to understand the key differences in practice patterns among medical specialties for AD care and identify educational gaps.
We study objective robustness failures, a type of out-of-distribution robustness failure in reinforcement learning (RL). Objective robustness failures occur when an RL agent retains its capabilities off-distribution yet pursues the wrong objective. We provide the first explicit empirical demonstrations of objective robustness failures and argue that this type of failure is critical to address. * Equal contribution Preprint. Under review.
No detectable systemic absorption of topically applied 0.02% and 0.04% chlormethine (CL) gel in patients with mycosis-fungoides cutaneous T-cell lymphoma (MF-CTCL
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.