Ismat Ara Reshma scite author profile

The class distribution of a training data set is an important factor which influences the performance of a deep learning-based system. Understanding the optimal class distribution is therefore crucial when building a new training set which may be costly to annotate. This is the case for histological images used in cancer diagnosis where image annotation requires domain experts. In this paper we tackle the problem of finding the optimal class distribution of a training set to be able to train an optimal model that detects cancer in histological images. We formulate several hypotheses which are then tested in scores of experiments with hundreds of trials. The experiments have been designed to account for both segmentation and clas-

show abstract

Natural vs balanced distribution in deep learning on whole slide images for cancer detection

Reshma

Cussat‐Blanc

Ionescu

et al. 2021

View full text Add to dashboard Cite

The class distribution of data is one of the factors that regulates the performance of machine learning models. However, investigations on the impact of different distributions available in the literature are very few, sometimes absent for domain-specific tasks.In this paper, we analyze the impact of natural and balanced distributions of the training set in deep learning (DL) models applied on histological images, also known as whole slide images (WSIs). WSIs are considered as the gold standard for cancer diagnosis. In recent years, researchers have turned their attention to DL models to automate and accelerate the diagnosis process. In the training of such DL models, filtering out the non-regions-of-interest from the WSIs and adopting an artificial distribution-usually a balanced distribution-is a common trend. In our analysis, we show that keeping the WSIs data in their usual distribution-which we call natural distribution-for DL training produces fewer false positives (FPs) with comparable false negatives (FNs) than the artificially-obtained balanced distribution. We conduct an empirical comparative study with 10 random folds for each distribution, comparing the resulting average performance levels in terms of five different evaluation metrics. Experimental results show the effectiveness of the natural distribution over the balanced one across all the evaluation metrics.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ismat Ara Reshma

Deep Analysis of CNN Settings for New Cancer Whole-slide Histological Images Segmentation: The Case of Small Training Sets

Finding a Suitable Class Distribution for Building Histological Images Datasets Used in Deep Model Training—The Case of Cancer Detection

Natural vs balanced distribution in deep learning on whole slide images for cancer detection

Contact Info

Product

Resources

About