2021
DOI: 10.21203/rs.3.rs-1000939/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ENRICH: Exploiting Image Similarity to Maximize Efficient Machine Learning in Medical Imaging

Abstract: Deep learning (DL) requires labeled data. Labeling medical images requires medical expertise, which is often a bottleneck. It is therefore useful to prioritize labeling those images that are most likely to improve a model's performance, a practice known as instance selection. Here we introduce ENRICH, a method that selects images for labeling based on how much novelty each image adds to the growing training set. In our implementation, we use cosine similarity between autoencoder embeddings to measure that nove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…Using our pipeline ( Methods , Fig 1a ), we created new hybrid images that were realistic overall with only small cut-paste combination artifacts ( Fig 1b ). Adjacent frames in an ultrasound video from the same patient will generally have very similar heart structure [ 21 ], so we randomly chose only one frame per view per patient ID for target views and five frames per patient ID for NT view to generate a set of candidate images; these frames also had to pass quality control ( Methods ). Thus, we used only five percent of the original training data to synthesize thousands of new images such that the new hybrid training dataset was approximately the size of the original training dataset ( Table 1 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Using our pipeline ( Methods , Fig 1a ), we created new hybrid images that were realistic overall with only small cut-paste combination artifacts ( Fig 1b ). Adjacent frames in an ultrasound video from the same patient will generally have very similar heart structure [ 21 ], so we randomly chose only one frame per view per patient ID for target views and five frames per patient ID for NT view to generate a set of candidate images; these frames also had to pass quality control ( Methods ). Thus, we used only five percent of the original training data to synthesize thousands of new images such that the new hybrid training dataset was approximately the size of the original training dataset ( Table 1 ).…”
Section: Resultsmentioning
confidence: 99%
“…Efficient use of training data in model training can lighten data labeling burden, especially when combined with other strategies for training dataset curation [ 21 ]. This is particularly advantageous in the medical domain where there is a scarcity of data and experts to label this data.…”
Section: Discussionmentioning
confidence: 99%
“…This study has several strengths. First, the diversity of the imaging cohort(23) represents a robust test for the DL model. This cohort is external to the dataset the model was trained on, differs with respect to image formats and scanning protocol, and represents imaging from several clinics and sonographers.…”
Section: Discussionmentioning
confidence: 99%