Dense semantic labeling is significant in high-resolution remote sensing imagery research and it has been widely used in land-use analysis and environment protection. With the recent success of fully convolutional networks (FCN), various types of network architectures have largely improved performance. Among them, atrous spatial pyramid pooling (ASPP) and encoder-decoder are two successful ones. The former structure is able to extract multi-scale contextual information and multiple effective field-of-view, while the latter structure can recover the spatial information to obtain sharper object boundaries. In this study, we propose a more efficient fully convolutional network by combining the advantages from both structures. Our model utilizes the deep residual network (ResNet) followed by ASPP as the encoder and combines two scales of high-level features with corresponding low-level features as the decoder at the upsampling stage. We further develop a multi-scale loss function to enhance the learning procedure. In the postprocessing, a novel superpixel-based dense conditional random field is employed to refine the predictions. We evaluate the proposed method on the Potsdam and Vaihingen datasets and the experimental results demonstrate that our method performs better than other machine learning or deep learning methods. Compared with the state-of-the-art DeepLab_v3+ our model gains 0.4% and 0.6% improvements in overall accuracy on these two datasets respectively.
We present new multilayer joint gait-pose manifolds (multilayer JGPMs) for complex human gait motion modeling, where three latent variables are defined jointly in a low-dimensional manifold to represent a variety of body configurations. Specifically, the pose variable (along the pose manifold) denotes a specific stage in a walking cycle; the gait variable (along the gait manifold) represents different walking styles; and the linear scale variable characterizes the maximum stride in a walking cycle. We discuss two kinds of topological priors for coupling the pose and gait manifolds, i.e., cylindrical and toroidal, to examine their effectiveness and suitability for motion modeling. We resort to a topologically-constrained Gaussian process (GP) latent variable model to learn the multilayer JGPMs where two new techniques are introduced to facilitate model learning under limited training data. First is training data diversification that creates a set of simulated motion data with different strides. Second is the topology-aware local learning to speed up model learning by taking advantage of the local topological structure. The experimental results on the Carnegie Mellon University motion capture data demonstrate the advantages of our proposed multilayer models over several existing GP-based motion models in terms of the overall performance of human gait motion modeling.
Efficient and accurate semantic segmentation is the key technique for automatic remote sensing image analysis. While there have been many segmentation methods based on traditional hand-craft feature extractors, it is still challenging to process high-resolution and large-scale remote sensing images. In this work, a novel patch-wise semantic segmentation method with a new training strategy based on fully convolutional networks is presented to segment common land resources. First, to handle the high-resolution image, the images are split as local patches and then a patch-wise network is built. Second, training data is preprocessed in several ways to meet the specific characteristics of remote sensing images, i.e., color imbalance, object rotation variations and lens distortion. Third, a multi-scale training strategy is developed to solve the severe scale variation problem. In addition, the impact of conditional random field (CRF) is studied to improve the precision. The proposed method was evaluated on a dataset collected from a capital city in West China with the Gaofen-2 satellite. The dataset contains ten common land resources (Grassland, Road, etc.). The experimental results show that the proposed algorithm achieves 54.96% in terms of mean intersection over union (MIoU) and outperforms other state-of-the-art methods in remote sensing image segmentation.
We propose a generalized Sum-of-Gaussians (G-SoG) model for statistical 3D shape modeling that is applied to human pose tracking from a single depth sensor. G-SoG generalizes the original SoG model by involving much fewer anisotropic Gaussians yet with better flexibility and adaptability. Both SoG and G-SoG are involved for pose tracking with different roles, where the former one is used to represent observed point cloud data through an efficient Octree partitioning, and the latter one is embedded with a quaternion-based articulated skeleton to create a standard human template model. We derive a differentiable similarity function between SoG and G-SoG that can be optimized analytically not only to learn a subject-specific articulated model but also to support sequential pose tracking where two additional terms (visibility and continuity) are also involved. Our algorithm is simple yet effective and can achieve real-time performance. The experimental results on a public depth dataset are promising and competitive when compared with state-of-the-art algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.