Automated detection of exudates from fundus images plays an important role in diabetic retinopathy (DR) screening and evaluation, for which supervised or semi-supervised learning methods are typically preferred. However, a potential limitation of supervised and semi-supervised learning based detection algorithms is that they depend substantially on the sample size of training data and the quality of annotations, which is the fundamental motivation of this work. In this study, we construct a dataset containing 1219 fundus images (from DR patients and healthy controls) with annotations of exudate lesions. In addition to exudate annotations, we also provide four additional labels for each image: left-versus-right eye label, DR grade (severity scale) from three different grading protocols, the bounding box of the optic disc (OD), and fovea location. This dataset provides a great opportunity to analyze the accuracy and reliability of different exudate detection, OD detection, fovea localization, and DR classification algorithms. Moreover, it will facilitate the development of such algorithms in the realm of supervised and semi-supervised learning.
Although deep learning based diabetic retinopathy (DR) classification methods typically benefit from welldesigned architectures of convolutional neural networks, the training setting also has a non-negligible impact on the prediction performance. The training setting includes various interdependent components, such as objective function, data sampling strategy and data augmentation approach. To identify the key components in a standard deep learning framework (ResNet-50) for DR grading, we systematically analyze the impact of several major components. Extensive experiments are conducted on a publicly-available dataset EyePACS. We demonstrate that (1) the ResNet-50 framework for DR grading is sensitive to input resolution, objective function, and composition of data augmentation, (2) using mean square error as the loss function can effectively improve the performance with respect to a task-specific evaluation metric, namely the quadratically-weighted Kappa, (3) utilizing eye pairs boosts the performance of DR grading and (4) using data resampling to address the problem of imbalanced data distribution in EyePACS hurts the performance. Based on these observations and an optimal combination of the investigated components, our framework, without any specialized network design, achieves the state-of-the-art result (0.8631 for Kappa) on the EyePACS test set (a total of 42670 fundus images) with only image-level labels. Our codes and pre-trained model are available at https://github.com/YijinHuang/pytorch-classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.