Generating scene graph to describe the object interactions inside an image gains increasing interests these years. However, most of the previous methods use complicated structures with slow inference speed or rely on the external data, which limits the usage of the model in real-life scenarios. To improve the efficiency of scene graph generation, we propose a subgraph-based connection graph to concisely represent the scene graph during the inference. A bottom-up clustering method is first used to factorize the entire graph into subgraphs, where each subgraph contains several objects and a subset of their relationships. By replacing the numerous relationship representations of the scene graph with fewer subgraph and object features, the computation in the intermediate stage is significantly reduced. In addition, spatial information is maintained by the subgraph features, which is leveraged by our proposed Spatial-weighted Message Passing (SMP) structure and Spatial-sensitive Relation Inference (SRI) module to facilitate the relationship recognition. On the recent Visual Relationship Detection and Visual Genome datasets, our method outperforms the state-of-the-art method in both accuracy and speed. Code has been made publicly available 6 .
Figure 1: Given static 3D scans or 3D scan sequences (in pink), we estimate the naked shape under clothing (beige). Our method obtains accurate results by minimizing an objective function that captures the visible details of the skin, while being robust to clothing. We show several pairs of clothed scan sequences and the estimated body shape underneath. AbstractWe address the problem of estimating human pose and body shape from 3D scans over time. Reliable estimation of 3D body shape is necessary for many applications including virtual try-on, health monitoring, and avatar creation for virtual reality. Scanning bodies in minimal clothing, however, presents a practical barrier to these applications. We address this problem by estimating body shape under clothing from a sequence of 3D scans. Previous methods that have exploited body models produce smooth shapes lacking personalized details. We contribute a new approach to recover a personalized shape of the person. The estimated shape deviates from a parametric model to fit the 3D scans. We demonstrate the method using high quality 4D data as well as sequences of visual hulls extracted from multi-view images. We also make available BUFF, a new 4D dataset that enables quantitative evaluation http://buff.is.tue.mpg.de/. Our method outperforms the state of the art in both pose estimation and shape estimation, qualitatively and quantitatively.
It is well known that blink, yawn, and heart rate changes give clue about a human's mental state, such as drowsiness and fatigue. In this paper, image sequences, as the raw data, are captured from smart phones which serve as non-contact optical sensors. Video streams containing subject's facial region are analyzed to identify the physiological sources that are mixed in each image. We then propose a method to extract blood volume pulse and eye blink and yawn signals as multiple independent sources simultaneously by multi-channel second-order blind identification (SOBI) without any other sophisticated processing, such as eye and mouth localizations. An overall decision is made by analyzing the separated source signals in parallel to determine the driver's driving state. The robustness of the proposed method is tested under various illumination contexts and a variety of head motion modes. Experiments on 15 subjects show that the multi-channel SOBI presents a promising framework to accurately detect drowsiness by merging multi-physiological information in a less complex way. INDEX TERMS Yawn, blink, blood volume pulse (BVP), drowsiness detection, second-order blind identification (SOBI).
Global Positioning System (GPS) has been used in many aerial and terrestrial high precision positioning applications. Multipath affects positioning and navigation performance. This paper proposes a convolutional neural network based carrier-phase multipath detection method. The method is based on the fact that the features of multipath characteristics in multipath contaminated data can be learned and identified by a convolutional neural network. The proposed method is validated with simulated and real GPS data and compared with existing multipath mitigation methods in position domain. The results show the proposed method can detect about 80% multipath errors (i.e., recall) in both simulated and real data. The impact of the proposed method on positioning accuracy improvement is demonstrated with two datasets, 18–30% improvement is obtained by down-weighting the detected multipath measurements. The focus of this paper is on the development and test of the proposed convolutional neural network based multipath detection algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.