Bicubic Target CoarseSR SRResNet FineSR Bicubic/CoarseSR SRResNet/FineSR Input (Bicubic) Target VDSR URDGN SRResNet FSRNet (Ours) FSRGAN (Ours) 14.9/0.55/1.24 PSNR/SSIM 15.5/0.58/1.38 15.9/0.58/1.18 17.0/0.60/0.95Figure 1: Visual results of different super-resolution methods on scale factor 8. AbstractFace Super-Resolution (SR) is a domain-specific superresolution problem. The specific facial prior knowledge could be leveraged for better super-resolving face images. We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i.e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement. Specifically, we first construct a coarse SR network to recover a coarse high-resolution (HR) image. Then, the coarse HR image is sent to two branches: a fine SR encoder and a prior information estimation network, which extracts the image features, and estimates landmark heatmaps/parsing maps respectively. Both image features and prior information are sent to a fine SR decoder to recover the HR image. To further generate realistic faces, we propose the Face Super-Resolution Generative Adversarial Network (FSRGAN) to incorporate the adversarial loss into FSRNet. Moreover, we introduce two related tasks, face alignment and parsing, as the new evaluation metrics for face SR, which address the inconsistency of classic metrics w.r.t. visual perception.Extensive benchmark experiments show that FSRNet and FSRGAN significantly outperforms state of the arts for very LR face SR, both quantitatively and qualitatively. Code will be made available upon publication.
Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an application, we investigate the extent to which hand segmentation alone can distinguish between different activities. We evaluate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.
For human pose estimation in monocular images, joint occlusions and overlapping upon human bodies often result in deviated pose predictions. Under these circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity. To address the problem by incorporating priors about the structure of human bodies, we propose a novel structure-aware convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator (G) generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors.To better capture the structure dependency of human body joints, the generator G is designed in a stacked multi-task manner to predict poses as well as occlusion heatmaps. Then, the pose and occlusion heatmaps are sent to the discriminators to predict the likelihood of the pose being real. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on two widely used human pose estimation benchmark datasets. Our approach significantly outperforms the state-of-the-art methods and almost always generates plausible human pose predictions.
Associated with the dramatic expansion of Chinese cities are the unprecedented scale and pace of changes to urban living environment. There is an imperative to assess residents' perceptions of neighbourhood environment and the impacts on life satisfaction. Drawing on a large-scale residential satisfaction survey conducted in Beijing in 2013, we examine the finegrained spatial distribution and determinants of residents' life satisfaction. A multilevel ordinal response model is employed to investigate the roles of neighbourhood satisfaction, perceived relative income, socio-demographic characteristics, and contextual factors in predicting life satisfaction. Results show that satisfaction with key neighbourhood characteristics including safety, physical and social environments, and travel convenience is statistically significantly associated with life satisfaction. Income relative to that of peers in local areas or to that in the past is a more important predictor of life satisfaction than absolute income. Other individual-level variables, such as age, family structure, hukou status, health, commuting time, and housing-related variables including housing tenure and floor space, are significant correlates of life satisfaction.
The delineation of the clinical target volume (CTV) is a crucial, laborious and subjective step in cervical cancer radiotherapy. The aim of this study was to propose and evaluate a novel end-to-end convolutional neural network (CNN) for fully automatic and accurate CTV in cervical cancer. Methods: A total of 237 computed tomography (CT) scans of patients with locally advanced cervical cancer were collected and evaluated. A novel 2.5D CNN network, called DpnUNet, was developed for CTV delineation and further applied for CTV and organ-at-risk (OAR) delineation simultaneously. Comprehensive comparisons and experiments were performed. The mean Dice similarity coefficient (DSC), 95th percentile Hausdorff distance (95HD) and subjective evaluation were used to assess the performance of this method. Results: The mean DSC and 95HD values were 0.86 and 5.34 mm for the delineated CTVs. The clinical experts' subjective assessments showed that 90% of the predicted contours were acceptable for clinical usage. The mean DSC and 95HD values were 0.91 and 4.05 mm for bladder, 0.85 and 2.16 mm for bone marrow, 0.90 and 1.27 mm for left femoral head, 0.90 and 1.51 mm for right femoral head, 0.82 and 4.29 mm for rectum, 0.85 and 4.35 mm for bowel bag, 0.82 and 4.96 mm for spinal cord respectively. The average delineation time for one patient's CT images was within 15 seconds. Conclusion:The experimental results demonstrate that the CTV and OARs delineated for cervical cancer by DpnUNet was in close agreement with the ground truth. DpnUNet could significantly reduce the radiation oncologists' contouring time.
Edge computing allows more computing tasks to take place on the decentralized nodes at the edge of networks. Today many delay sensitive, mission-critical applications can leverage these edge devices to reduce the time delay or even to enable real-time, online decision making thanks to their on-site presence. Human objects detection, behavior recognition and prediction in smart surveillance fall into that category, where a transition of a huge volume of video streaming data can take valuable time and place heavy pressure on communication networks. It is widely recognized that video processing and object detection are computing intensive and too expensive to be handled by resourcelimited edge devices. Inspired by the depthwise separable convolution and Single Shot Multi-Box Detector (SSD), a lightweight Convolutional Neural Network (L-CNN) is introduced in this paper. By narrowing down the classifier's searching space to focus on human objects in surveillance video frames, the proposed L-CNN algorithm is able to detect pedestrians with an affordable computation workload to an edge device. A prototype has been implemented on an edge node (Raspberry PI 3) using openCV libraries, and satisfactory performance is achieved using realworld surveillance video streams. The experimental study has validated the design of L-CNN and shown it is a promising approach to computing intensive applications at the edge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.