With recent advances in deep learning research, generative models have achieved great achievements and play an increasingly important role in current industrial applications. At the same time, technologies derived from generative methods are also under a wide discussion with researches, such as style transfer, image synthesis and so on. In this work, we treat generative methods as a possible solution to medical image augmentation. We proposed a context-aware generative framework, which can successfully change the gray scale of CT scans but almost without any semantic loss. By producing target images that with specific style / distribution, we greatly increased the robustness of segmentation model after adding generations into training set. Besides, we improved 2– 4% pixel segmentation accuracy over original U-NET in terms of spine segmentation. Lastly, we compared generations produced by networks when using different feature extractors (Vgg, ResNet and DenseNet) and made a detailed analysis on their performances over style transfer.
With the widespread use of deep learning methods, semantic segmentation has achieved great improvements in recent years. However, many researchers have pointed out that with multiple uses of convolution and pooling operations, great information loss would occur in the extraction processes. To solve this problem, various operations or network architectures have been suggested to make up for the loss of information. We observed a trend in many studies to design a network as a symmetric type, with both parts representing the “encoding” and “decoding” stages. By “upsampling” operations in the “decoding” stage, feature maps are constructed in a certain way that would more or less make up for the losses in previous layers. In this paper, we focus on upsampling operations, make a detailed analysis, and compare current methods used in several famous neural networks. We also combine the knowledge on image restoration and design a new upsampled layer (or operation) named the TGV upsampling algorithm. We successfully replaced upsampling layers in the previous research with our new method. We found that our model can better preserve detailed textures and edges of feature maps and can, on average, achieve 1.4–2.3% improved accuracy compared to the original models.
The image-to-image translation method aims to learn inter-domain mappings from paired/unpaired data. Although this technique has been widely used for visual predication tasks—such as classification and image segmentation—and achieved great results, we still failed to perform flexible translations when attempting to learn different mappings, especially for images containing multiple instances. To tackle this problem, we propose a generative framework DAGAN (Domain-aware Generative Adversarial etwork) that enables domains to learn diverse mapping relationships. We assumed that an image is composed with background and instance domain and then fed them into different translation networks. Lastly, we integrated the translated domains into a complete image with smoothed labels to maintain realism. We examined the instance-aware framework on datasets generated by YOLO and confirmed that this is capable of generating images of equal or better diversity compared to current translation models.
With increasing demands for high-quality semantic segmentation in the industry, hard-distinguishing semantic boundaries have posed a significant threat to existing solutions. Inspired by real-life experience, i.e., combining varied observations contributes to higher visual recognition confidence, we present the equipotential learning (EPL) method. This novel module transfers the predicted/ground-truth semantic labels to a self-defined potential domain to learn and infer decision boundaries along customized directions. The conversion to the potential domain is implemented via a lightweight differentiable anisotropic convolution without incurring any parameter overhead. Besides, the designed two loss functions, the point loss and the equipotential line loss implement anisotropic field regression and category-level contour learning, respectively, enhancing prediction consistencies in the inter/intra-class boundary areas. More importantly, EPL is agnostic to network architectures, and thus it can be plugged into most existing segmentation models. This paper is the first attempt to address the boundary segmentation problem with field regression and contour learning. Meaningful performance improvements on Pascal Voc 2012 and Cityscapes demonstrate that the proposed EPL module can benefit the offthe-shelf fully convolutional network models when recognizing semantic boundary areas. Besides, intensive comparisons and analysis show the favorable merits of EPL for distinguishing semantically-similar and irregular-shaped categories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.