Jacob Pfau scite author profile

Artificial intelligence is becoming increasingly important in dermatology, with studies reporting accuracy matching or exceeding dermatologists for the diagnosis of skin lesions from clinical and dermoscopic images. However, real-world clinical validation is currently lacking. We review dermatological applications of deep learning, the leading artificial intelligence technology for image analysis, and discuss its current capabilities, potential failure modes, and challenges surrounding performance assessment and interpretability. We address the following three primary applications: (i) teledermatology, including triage for referral to dermatologists; (ii) augmenting clinical assessment during face-to-face visits; and (iii) dermatopathology. We discuss equity and ethical issues related to future clinical adoption and recommend specific standardization of metrics for reporting model performance.

show abstract

Artificial Intelligence in Teledermatology

Xiong

Pfau

Young

et al. 2019

Curr Derm Rep

View full text Add to dashboard Cite

Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

et al. 2021

View full text Add to dashboard Cite

Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.

show abstract

Robust Semantic Interpretability: Revisiting Concept Activation Vectors

Pfau¹,

Young²,

Wei³

et al. 2021

Preprint

View full text Add to dashboard Cite

Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods. 1

show abstract

383 Assessing deep learning artefact bias using global saliency

Pfau

Young

Wei

et al. 2020

Journal of Investigative Dermatology

View full text Add to dashboard Cite

Atopic dermatitis (AD) is a chronic inflammatory skin disease. Many patients with AD seek care from both primary care physicians and dermatologists. However, little is known regarding topical corticosteroid prescribing patterns between these two specialties. We sought to determine if differences exist in the topical corticosteroid (TCS) prescribing patterns between dermatologists, family medicine physicians, and internal medicine physicians. We conducted a population-based, cross-sectional analysis using data from the National Ambulatory Medical Care Survey (NAMCS) from 2006 to 2016. There were 5,071,158 (weighted) outpatient AD visits between 2006 and 2016 for adults who were seen by physicians from family medicine, internal medicine, and dermatology. There was not a statistically significant difference in the rate of TCS prescriptions for AD between family medicine physicians and dermatologists (39.1% vs. 52.2%; p¼0.27). However, family medicine physicians had a higher rate of prescribing TCS for AD than internal medicine physicians (39.1% vs. 5.1%; p¼0.002). Dermatologists had a significantly higher rate of prescribing TCS for AD compared to internal medicine physicians (52% versus 5%; p<0.001). Our findings demonstrate that dermatologists prescribe topical corticosteroids for atopic dermatitis more frequently compared to internal medicine physicians but not in comparison to family medicine physicians. It is important to understand the key differences in practice patterns among medical specialties for AD care and identify educational gaps.

show abstract

Goal Misgeneralization in Deep Reinforcement Learning

Lauro¹,

Koch²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study objective robustness failures, a type of out-of-distribution robustness failure in reinforcement learning (RL). Objective robustness failures occur when an RL agent retains its capabilities off-distribution yet pursues the wrong objective. We provide the first explicit empirical demonstrations of objective robustness failures and argue that this type of failure is critical to address. * Equal contribution Preprint. Under review.

show abstract

Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias

Pfau¹,

Young²,

Wei³

et al. 2019

Preprint

View full text Add to dashboard Cite

836 Calibration performance of deep neural networks for image classification declines on real-world, versus curated, test sets

Young

Pfau

Keiser

et al. 2020

Journal of Investigative Dermatology

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jacob Pfau

Artificial Intelligence in Dermatology: A Primer

Artificial Intelligence in Teledermatology

Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

Robust Semantic Interpretability: Revisiting Concept Activation Vectors

383 Assessing deep learning artefact bias using global saliency

Goal Misgeneralization in Deep Reinforcement Learning

Global Saliency: Aggregating Saliency Maps to Assess Dataset Artefact Bias

836 Calibration performance of deep neural networks for image classification declines on real-world, versus curated, test sets

Contact Info

Product

Resources

About