We propose an end-to-end framework for the dense, pixelwise classification of satellite imagery with convolutional neural networks (CNNs). In our framework, CNNs are directly trained to produce classification maps out of the input images. We first devise a fully convolutional architecture and demonstrate its relevance to the dense classification problem. We then address the issue of imperfect training data through a two-step training approach: CNNs are first initialized by using a large amount of possibly inaccurate reference data, then refined on a small amount of accurately labeled data. To complete our framework we design a multi-scale neuron module that alleviates the common trade-off between recognition and precise localization. A series of experiments show that our networks take into account a large amount of context to provide fine-grained classification maps.
New challenges in remote sensing impose the necessity of designing pixel classification methods that, once trained on a certain dataset, generalize to other areas of the earth. This may include regions where the appearance of the same type of objects is significantly different. In the literature it is common to use a single image and split it into training and test sets to train a classifier and assess its performance, respectively. However, this does not prove the generalization capabilities to other inputs. In this paper, we propose an aerial image labeling dataset that covers a wide range of urban settlement appearances, from different geographic locations. Moreover, the cities included in the test set are different from those of the training set. We also experiment with convolutional neural networks on our dataset.
For quantitative PET information, correction of tissue photon attenuation is mandatory. Generally in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating radionuclide source, or from the CT scan in a combined PET/CT scanner. In the case of PET/MRI scanners currently under development, insufficient space for the rotating source exists; the attenuation map can be calculated from the MR image instead. This task is challenging because MR intensities correlate with proton densities and tissue-relaxation properties, rather than with attenuation-related mass density. Methods: We used a combination of local pattern recognition and atlas registration, which captures global variation of anatomy, to predict pseudo-CT images from a given MR image. These pseudo-CT images were then used for attenuation correction, as the process would be performed in a PET/CT scanner. Results: For human brain scans, we show on a database of 17 MR/CT image pairs that our method reliably enables estimation of a pseudo-CT image from the MR image alone. On additional datasets of MRI/PET/ CT triplets of human brain scans, we compare MRI-based attenuation correction with CT-based correction. Our approach enables PET quantification with a mean error of 3.2% for predefined regions of interest, which we found to be clinically not significant. However, our method is not specific to brain imaging, and we show promising initial results on 1 whole-body animal dataset. Conclusion: This method allows reliable MRI-based attenuation correction for human brain scans. Further work is necessary to validate the method for whole-body imaging.
Abstract. We aim to color greyscale images automatically, without any manual intervention. The color proposition could then be interactively corrected by user-provided color landmarks if necessary. Automatic colorization is nontrivial since there is usually no one-to-one correspondence between color and local texture. The contribution of our framework is that we deal directly with multimodality and estimate, for each pixel of the image to be colored, the probability distribution of all possible colors, instead of choosing the most probable color at the local level. We also predict the expected variation of color at each pixel, thus defining a nonuniform spatial coherency criterion. We then use graph cuts to maximize the probability of the whole colored image at the global level. We work in the L-a-b color space in order to approximate the human perception of distances between colors, and we use machine learning tools to extract as much information as possible from a dataset of colored examples. The resulting algorithm is fast, designed to be more robust to texture noise, and is above all able to deal with ambiguity, in contrary to previous approaches.
Abstract. This paper proposes a framework for dealing with several problems related to the analysis of shapes. Two related such problems are the definition of the relevant set of shapes and that of defining a metric on it. Following a recent research monograph by Delfour and Zolésio [11], we consider the characteristic functions of the subsets of R 2 and their distance functions. The L 2 norm of the difference of characteristic functions, the L ∞ and the W 1,2 norms of the difference of distance functions define interesting topologies, in particular the well-known Hausdorff distance. Because of practical considerations arising from the fact that we deal with image shapes defined on finite grids of pixels, we restrict our attention to subsets of R 2 of positive reach in the sense of Federer [16], with smooth boundaries of bounded curvature. For this particular set of shapes we show that the three previous topologies are equivalent. The next problem we consider is that of warping a shape onto another by infinitesimal gradient descent, minimizing the corresponding distance. Because the distance function involves an inf, it is not differentiable with respect to the shape. We propose a family of smooth approximations of the distance function which are continuous with respect to the Hausdorff topology, and hence with respect to the other two topologies. We compute the corresponding Gâteaux derivatives. They define deformation flows that can be used to warp a shape onto another by solving an initial value problem. We show several examples of this warping and prove properties of our approximations that relate to the existence of local minima. We then use this tool to produce computational definitions of the empirical mean and covariance of a set of shape examples. They yield an analog of the notion of principal modes of variation. We illustrate them on a variety of examples.
This paper tackles an important aspect of the variational problem underlying active contours: optimization by gradient flows. Classically, the definition of a gradient depends directly on the choice of an inner product structure. This consideration is largely absent from the active contours literature. Most authors, explicitely or implicitely, assume that the space of admissible deformations is ruled by the canonical L 2 inner product. The classical gradient flows reported in the literature are relative to this particular choice. Here, we investigate the relevance of using (i) other inner products, yielding other gradient descents, and (ii) other minimizing flows not deriving from any inner product. In particular, we show how to induce different degrees of spatial consistency into the minimizing flow, in order to decrease the probability of getting trapped into irrelevant local minima. We report numerical experiments indicating that the sensitivity of the active contours method to initial conditions, which seriously limits its applicability and efficiency, is alleviated by our application-specific spatially coherent minimizing flows. We show that the choice of the inner product can be seen as a prior on the deformation fields and we present an extension of the definition of the gradient toward more general priors.
We propose a convolutional neural network (CNN) model for remote sensing image classification. Using CNNs provides us with a means of learning contextual features for large-scale image labeling. Our network consists of four stacked convolutional layers that downsample the image and extract relevant features. On top of these, a deconvolutional layer upsamples the data back to the initial resolution, producing a final dense image labeling. Contrary to previous frameworks, our network contains only convolution and deconvolution operations. Experiments on aerial images show that our network produces more accurate classifications in lower computational time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.