Abstract. Detecting objects, estimating their pose and recovering 3D shape information are critical problems in many vision and robotics applications. This paper addresses the above needs by proposing a new method called DEHV -Depth-Encoded Hough Voting detection scheme. Inspired by the Hough voting scheme introduced in [13], DEHV incorporates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Extensive quantitative and qualitative experimental analysis on existing datasets [6,9,22] and a newly proposed 3D table-top object category dataset shows that our DEHV scheme obtains competitive detection and pose estimation results as well as convincing 3D shape reconstruction from just one single uncalibrated image. Finally, we demonstrate that our technique can be successfully employed as a key building block in two application scenarios (highly accurate 6 degrees of freedom (6 DOF) pose estimation and 3D object modeling).
Abstract-This paper presents an approach for automatically synthesizing and re-synthesizing a hybrid controller that guarantees a robot will exhibit a user-defined high-level behavior while exploring a partially known workspace (map).The approach includes dynamically adjusting the discrete abstraction of the workspace as new regions are detected by the robot's sensors, automatically rewriting the specification (formally defined using Linear Temporal Logic) and re-synthesizing the control while preserving the robot state and its history of task completion. The approach is implemented within the LTLMoP toolkit and is demonstrated using a Pioneer 3-DX in the lab.
High-quality and high-fidelity removal of noise in the Electrocardiogram (ECG) signal is of great significance to the auxiliary diagnosis of ECG diseases. In view of the single function of traditional denoising methods and the insufficient performance of signal details after denoising, a new method of ECG denoising based on the combination of the Generative Adversarial Network (GAN) and Residual Network is proposed. The method adopted in this paper is based on the GAN structure, and it restructures the generator and discriminator. In the generator network, residual blocks and Skip-Connecting are used to deepen the network structure and better capture the in-depth information in the ECG signal. In the discriminator network, the ResNet framework is used. In order to optimize the noise reduction process and solve the lack of local relevance considering the global ECG problem, the differential function and overall function of the maximum local difference are added in the loss function in this paper. The experimental results prove that the method used in this article has better performance than the current excellent S-Transform (S-T) algorithm, Wavelet Transform (WT) algorithm, Stacked Denoising Autoencoder (S-DAE) algorithm, and Improved Denoising Autoencoder (I-DAE) algorithm. Experiments show that the Root Mean Square Error (RMSE) of this method in the Massachusetts Institute of Technology and Beth Israel Hospital (MIT-BIH) noise pressure database is 0.0102, and the Signal-to-Noise Ratio (SNR) is 40.8526 dB, which is compared with that of the most advanced experimental methods. Our method improves the SNR by 88.57% on average. Besides the three noise intensities for comparison experiments, additional noise reduction experiments are also performed under four noise intensities in our paper. The experimental results verify the scientific nature of the model, which is that our method can effectively retain the important information conveyed by the original signal.
Sparse representation (SR) or sparse coding (SC), which assumes the data vector can be sparse represented by linear combination over basis vectors, has been successfully applied in machine learning and computer vision tasks. In order to solve sparse representation problem, regularization technique is applied to constrain the sparsity of coefficients of linear representation. In this paper, a reconstruction-error-based adaptive regularization parameter estimation method is proposed to improve the representation ability of SR. The adaptive regularization parameter aims to balance the reconstruction error and the sparsity of coefficient vector and to minimize reconstruction error. Substantial experiments are performed on some benchmark databases. Simulation results demonstrate that this adaptive regularization parameter estimation method can find a proper parameter for each test sample, consequently, can improve the accuracy of SR and eliminate a time-consuming cross-validation process.
Heatmap-based traditional approaches for estimating human pose usually suffer from drawbacks such as high network complexity or suboptimal accuracy. Focusing on the issue of multi-person pose estimation without heatmaps, this paper proposes an end-to-end, lightweight human pose estimation network using a multi-scale coordinate attention mechanism based on the Yolo-Pose network to improve the overall network performance while ensuring the network is lightweight. Specifically, the lightweight network GhostNet was first integrated into the backbone to alleviate the problem of model redundancy and produce a significant number of effective feature maps. Then, by combining the coordinate attention mechanism, the sensitivity of our proposed network to direction and location perception was enhanced. Finally, the BiFPN module was fused to balance the feature information of different scales and further improve the expression ability of convolutional features. Experiments on the COCO 2017 dataset showed that, compared with the baseline method YOLO-Pose, the average accuracy of the proposed network on the COCO 2017 validation dataset was improved by 4.8% while minimizing the amount of network parameters and calculations. The experimental results demonstrated that our proposed method can improve the detection accuracy of human pose estimation while ensuring that the model is lightweight.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.