VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Nguyen, Ha Q.; Lam, Khanh; Le, Linh; Pham, Hieu H.; Tran, Dat Q.; Nguyen, Dung; Le, Dung; Pham, Chi M.; Tong, Hang T. T.; Dinh, Diep H.; Do, Cuong D.; Doan, Luu T.; Nguyen, Cao; Nguyen, Binh T.; Nguyen, Que V.; Hoang, Au D.; Phan, Hien N.; Nguyễn, Anh Tuấn; Ho, Phuong H.; Ngo, Dat; Nguyen, Nghia; Nguyen, Nhan T.; Dao, Minh Quang; Vu, Van

doi:10.48550/arxiv.2012.15029

Cited by 25 publications

(40 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We illustrate FCL on anterior and posterior chest X-rays (CXRs). We use three public largescale CXR datasets as the unlabeled pre-training data to simulate the federated dataset size # of classes multi-label multi-view balanced resolution CheXpert [11] 371920 14 390 × 320 ChestX-ray8 [26] environment, namely CheXpert [11], ChestX-ray8 [26], and VinDr-CXR [17] (see Table 1). Three datasets are collected and annotated from different sources independently and express a large variety in data modalities (see Fig.…”

Section: Methodsmentioning

confidence: 99%

Federated Contrastive Learning for Decentralized Unlabeled Medical Images

Dong¹,

Voiculescu²

2021

Preprint

View full text Add to dashboard Cite

A label-efficient paradigm in computer vision is based on self-supervised contrastive pre-training on unlabeled data followed by fine-tuning with a small number of labels. Making practical use of a federated computing environment in the clinical domain and learning on medical images poses specific challenges. In this work, we propose FedMoCo, a robust federated contrastive learning (FCL) framework, which makes efficient use of decentralized unlabeled medical data. FedMoCo has two novel modules: metadata transfer, an inter-node statistical data augmentation module, and self-adaptive aggregation, an aggregation module based on representational similarity analysis. To the best of our knowledge, this is the first FCL work on medical images. Our experiments show that FedMoCo can consistently outperform FedAvg, a seminal federated learning framework, in extracting meaningful representations for downstream tasks. We further show that FedMoCo can substantially reduce the amount of labeled data required in a downstream task, such as COVID-19 detection, to achieve a reasonable performance.

show abstract

Section: Methodsmentioning

confidence: 99%

Federated Contrastive Learning for Decentralized Unlabeled Medical Images

Dong¹,

Voiculescu²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Computer-aided diagnosis (CAD) systems for identification of lung abnormality in adult CXRs have recently achieved great success thanks to the availability of large labeled datasets [6][7][8][9][10] . Many large-scale CXR datasets of adult patients such as ChestX-ray14 6 , Padchest 7 , CheXpert 8 , MIMIC-CXR 9 and VinDr-CXR 10 have been established and released in recent years. These datasets boosted new advances in exploring new machine learning-based approaches in the interpretation of CXR in adults 8,[11][12][13][14][15][16] .…”

Section: Background and Summarymentioning

confidence: 99%

VinDr-PCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children

Nguyen¹,

Pham

Tran³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Computer-aided diagnosis systems in adult chest radiography (CXR) have recently achieved great success thanks to the availability of large-scale, annotated datasets and the advent of high-performance supervised learning algorithms. However, the development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release VinDr-PCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist who has more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases. In particular, each abnormal finding was identified via a rectangle bounding box on the image. To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases. For algorithm development, the dataset was divided into a training set of 7,728 and a test set of 1,397. To encourage new advances in pediatric CXR interpretation using data-driven approaches, we provide a detailed description of the VinDr-PCXR data sample and make the dataset publicly available on https://physionet.org/.

show abstract

“…Earlier, NIH Chest X-ray 14 [30] proposed by Wang et al contains 112,120 front-view images of 14 disease categories, among which there are 880 images of 8 categories containing box annotations. Lately, Nguyen et al proposed VinDr-CXR [19] which contains 18,000 images that were manually annotated with 22 classes of rectangles surrounding abnormalities and 6 global labels of suspected diseases. There also exist some datasets that focus on a single disease, such as the Pneumonia detection dataset 1 , Tuberculosis detection dataset [18] and Pneumothorax segmentation dataset 2 , etc.…”

Section: Automatic Chest X-ray Analysismentioning

confidence: 99%

A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

Lian

Liu

Zhang

et al. 2021

IEEE Trans. Med. Imaging

View full text Add to dashboard Cite

Instance level detection and segmentation of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN. The SAR-Net consists of three relation modules: 1. the anatomical structure relation module encoding spatial relations between diseases and anatomical parts. 2. the contextual relation module aggregating clues based on query-key pair of disease RoI and lung fields. 3. the disease relation module propagating co-occurrence and causal relations into disease proposals. Towards making a practical system, we also provide ChestX-Det, a chest X-Ray dataset with instance-level annotations (boxes and masks). ChestX-Det is a subset of the public dataset NIH ChestX-ray14. It contains ∼ 3500 images of 13 common disease categories labeled by three board-certified radiologists. We evaluate our SAR-Net on it and another dataset DR-Private. Experimental results show that it can enhance the strong baseline of Mask R-CNN with significant improvements. The ChestX-Det is released at

show abstract

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Cited by 25 publications

References 16 publications

Federated Contrastive Learning for Decentralized Unlabeled Medical Images

Federated Contrastive Learning for Decentralized Unlabeled Medical Images

VinDr-PCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children

A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

Contact Info

Product

Resources

About