Recently, deep learning based video super-resolution (SR) methods have achieved promising performance. To simultaneously exploit the spatial and temporal information of videos, employing 3-dimensional (3D) convolutions is a natural approach. However, straight utilizing 3D convolutions may lead to an excessively high computational complexity which restricts the depth of video SR models and thus undermine the performance. In this paper, we present a novel fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load. Specifically, we propose a fast spatio-temporal residual block (FRB) that divide each 3D filter to the product of two 3D filters, which have considerably lower dimensions. Furthermore, we design a cross-space residual learning that directly links the low-resolution space and the high-resolution space, which can greatly relieve the computational burden on the feature fusion and up-scaling parts. Extensive evaluations and comparisons on benchmark datasets validate the strengths of the proposed approach and demonstrate that the proposed network significantly outperforms the current state-of-the-art methods.
Residual connections signi cantly boost the performance of deep neural networks. However, there are few theoretical results that address the in uence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This paper studies the in uence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We prove that the upper bound of the covering number is the same as chain-like neural networks, if the total numbers of the weight matrices and nonlinearities are xed, no matter whether they are in the residuals or not. This result demonstrates that residual connections may not increase the hypothesis complexity of the neural network compared with the chain-like counterpart. Based on the upper bound of the covering number, we then obtain an O(1/ √ N ) margin-based multi-class generalization bound for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straight-forward. From our generalization bound, a practical implementation is summarized: to approach a good generalization ability, we need to use regularization terms to control the magnitude of the norms of weight matrices not to increase too much, which justi es the standard technique of weight decay.
Device-free passive detection is an emerging technology to detect whether there exist any moving entities in the areas of interest without attaching any device to them. It is an essential primitive for a broad range of applications including intrusion detection for safety precautions, patient monitoring in hospitals, child and elder care at home, and so forth. Despite the prevalent signal feature Received Signal Strength (RSS), most robust and reliable solutions resort to a finer-grained channel descriptor at the physical layer, e.g., the Channel State Information (CSI) in the 802.11n standard. Among a large body of emerging techniques, however, few of them have explored the full potential of CSI for human detection. Moreover, space diversity supported by nowadays popular multiantenna systems are not investigated to a comparable extent as frequency diversity. In this article, we propose a novel scheme for device-free PAssive Detection of moving humans with dynamic Speed (PADS). Both full information (amplitude and phase) of CSI and space diversity across multiantennas in MIMO systems are exploited to extract and shape sensitive metrics for accuracy and robust target detection. We prototype PADS on commercial WiFi devices, and experiment results in different scenarios demonstrate that PADS achieves great performance improvement in spite of dynamic human movements.
Low-cost Unmanned Airborne Vehicles (UAVs) equipped with consumer-grade imaging systems have emerged as a potential remote sensing platform that could satisfy the needs of a wide range of civilian applications. Among these applications, UAV-based agricultural mapping and monitoring have attracted significant attention from both the research and professional communities. The interest in UAV-based remote sensing for agricultural management is motivated by the need to maximize crop yield. Remote sensing-based crop yield prediction and estimation are primarily based on imaging systems with different spectral coverage and resolution (e.g., RGB and hyperspectral imaging systems). Due to the data volume, RGB imaging is based on frame cameras, while hyperspectral sensors are primarily push-broom scanners. To cope with the limited endurance and payload constraints of low-cost UAVs, the agricultural research and professional communities have to rely on consumer-grade and light-weight sensors. However, the geometric fidelity of derived information from push-broom hyperspectral scanners is quite sensitive to the available position and orientation established through a direct geo-referencing unit onboard the imaging platform (i.e., an integrated Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS). This paper presents an automated framework for the integration of frame RGB images, push-broom hyperspectral scanner data and consumer-grade GNSS/INS navigation data for accurate geometric rectification of the hyperspectral scenes. The approach relies on utilizing the navigation data, together with a modified Speeded-Up Robust Feature (SURF) detector and descriptor, for automating the identification of conjugate features in the RGB and hyperspectral imagery. The SURF modification takes into consideration the available direct geo-referencing information to improve the reliability of the matching procedure in the presence of repetitive texture within a mechanized agricultural field. Identified features are then used to improve the geometric fidelity of the previously ortho-rectified hyperspectral data. Experimental results from two real datasets show that the geometric rectification of the hyperspectral data was improved by almost one order of magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.