In this paper, we propose a novel approach for few-shot semantic segmentation with sparse labeled images. We investigate the effectiveness of our method, which is based on the Model-Agnostic Meta-Learning (MAML) algorithm, in the medical scenario, where the use of sparse labeling and few-shot can alleviate the cost of producing new annotated datasets. Our method uses sparse labels in the meta-training and dense labels in the meta-test, thus making the model learn to predict dense labels from sparse ones. We conducted experiments with four Chest X-Ray datasets to evaluate two types of annotations (grid and points). The results show that our method is the most suitable when the target domain highly differs from source domains, achieving Jaccard scores comparable to dense labels, using less than 2% of the pixels of an image with labels in few-shot scenarios.
I. INTRODUCTIONMedical images are useful tools to assist doctors in multiple clinical scenarios and to plan for surgery. X-Ray, Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and other imaging modalities are non-invasive methods that can help in diagnosis, pathology localization, anatomical studies, and other tasks [15].Convolutional Neural Networks (CNNs) and their variants are the state of the art for object classification, detection, semantic segmentation and other Computer Vision problems. Classical convolutional networks are known for their large data requirements, often hindering their usage in scenarios were data availability is limited, as in medical imaging. Relatively few medical datasets are publicly available due to privacy and ethical concerns [1,7,10,17,18,20,26].Even among the public datasets, properly curated labeled data is limited due to the need for specialized annotators (i.e. radiologists), severely hampering the creation of general models for medical image understanding. While many datasets contain image-level annotations indicating the presence or absence of a set of medical conditions, the creation of pixel-level labels that allow for the training of semantic segmentation models is much more laborious. Volumetric image modalities as MRIs or CT scans further compound these difficulties by requiring per-slice annotations, often followed by cross-axes analysis to detect inconsistencies, which