Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Shi, Wenzhe; Caballero, José; Huszár, Ferenc; Totz, Johannes; Aitken, Andrew P.; Bishop, Rob; Rueckert, Daniel; Wang, Zehan

doi:10.1109/cvpr.2016.207

Cited by 4,977 publications

(3,103 citation statements)

References 49 publications

Supporting

Mentioning

3,058

Contrasting

Unclassified

Order By: Relevance

“…The generator function G θ G is parametrized by θ G , and the discriminator function D θ D is parametrized by θ D . Generally, the target of previous supervised SR algorithms is commonly the minimization of the mean squared error (MSE) [Wang and Bovik, 2009] between the recovered HR image and the ground truth. Besides the MSE loss, SR-GAN also defines a perceptual loss using high-level feature maps of the VGG network [Simonyan and Zisserman, 2014], which makes the super-resolved image and HR reference image perceptually similar.…”

Section: Revisit Sr-ganmentioning

confidence: 99%

“…Furthermore, a VGG loss based on the ReLU activation layers of the pre-trained 19 layer VGG network, described in [Simonyan and Zisserman, 2014], is exploited for perceptual similarity, measuring on higher semantical level which a naive MSE loss is unable to handle. It should be mentioned that the VGG losses are adopted in all the VGG networks, who share common parameters.…”

Section: Generator Network Lossmentioning

confidence: 99%

“…Generally, the output of VGG network [Simonyan and Zisserman, 2014] is the category of input image. Suppose that the VGG network is able to judge each real imageÎ 3 as a human category.…”

Section: Common-human Lossmentioning

confidence: 99%

“…LR images are randomly downsampled and selected to construct the gallery set. Cumulative Matching Characteristic (CMC) curves [Wang et al, 2007] were used to calculate the average performance, and the value of CMC@k indicates the percentage of the real match ranked in the top k.…”

Section: Experimental Datasets and Settingsmentioning

confidence: 99%

“…Person re-identification (REID) is the task of visually matching images of the same person, extracted from nonoverlapping camera views in open surveillance spaces [Wang et al, 2016a]. Since biometric cues, such as face and gait, are usually unreliable or even infeasible in the uncontrolled surveillance environment, the appearance of individuals is The resolutions of these images are significantly different.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification

Wang

Yang

et al. 2018

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Person re-identification (REID) is an important task in video surveillance and forensics applications. Most of previous approaches are based on a key assumption that all person images have uniform and sufficiently high resolutions. Actually, various low-resolutions and scale mismatching always exist in open world REID. We name this kind of problem as Scale-Adaptive Low Resolution Person Re-identification (SALR-REID). The most intuitive way to address this problem is to increase various low-resolutions (not only low, but also with different scales) to a uniform high-resolution. SR-GAN is one of the most competitive image superresolution deep networks, designed with a fixed upscaling factor. However, it is still not suitable for SALR-REID task, which requires a network not only synthesizing high-resolution images with different upscaling factors, but also extracting discriminative image feature for judging person's identity. (1) To promote the ability of scale-adaptive upscaling, we cascade multiple SRGANs in series. (2) To supplement the ability of image feature representation, we plug-in a reidentification network. With a unified formulation, a Cascaded Super-Resolution GAN (CSR-GAN) framework is proposed. Extensive evaluations on two simulated datasets and one public dataset demonstrate the advantages of our method over related state-of-the-art methods.

show abstract

Section: Revisit Sr-ganmentioning

confidence: 99%

Section: Generator Network Lossmentioning

confidence: 99%

“…Generally, the output of VGG network [Simonyan and Zisserman, 2014] is the category of input image. Suppose that the VGG network is able to judge each real imageÎ 3 as a human category.…”

Section: Common-human Lossmentioning

confidence: 99%

Section: Experimental Datasets and Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification

Wang

Yang

et al. 2018

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

Materials Research at Shanghai Jiao Tong University

Chen

Feng

2015

Advanced Materials

View full text Add to dashboard Cite

Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is critical for accurate image reconstruction. In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. Specifically, we propose the recursive-generalization self-attention (RG-SA). It recursively aggregates input features into representative feature maps, and then utilizes cross-attention to extract global information. Meanwhile, the channel dimensions of attention matrices (query, key, and value) are further scaled for a better trade-off between computational overheads and performance. Furthermore, we combine the RG-SA with local self-attention to enhance the exploitation of the global context, and propose the hybrid adaptive integration (HAI) for module integration. The HAI allows the direct and effective fusion between features at different levels (local or global). Extensive experiments demonstrate that our RGT outperforms recent state-of-the-art methods.

show abstract

Achromatic and Resolution Enhancement Light Field Deep Neural Network for ZnO Nematic Liquid Crystal Microlens Array

Li,

Chen

et al. 2023

Advanced Photonics Research

View full text Add to dashboard Cite

Nematic liquid‐crystal microlens arrays (LC‐MLAs) often exhibit chromatic aberration and low resolution, severely compromising their optical imaging quality. This study proposes an achromatic and resolution enhancement light field (ARELF) deep neural network (DNN) to address these issues. The training set is constructed by incorporating LC‐MLA characteristics’ degradation, retrofitting the vimeo90k dataset. The network's hidden layer is trained to learn about chromatic aberration and low resolution of LC‐MLA while extracting imaging features and fusing the information of complementary features of a light field under varying voltages. The loss function includes both chromatic aberration and overall resolution. The light field images of ZnO LC‐MLA under seven consecutive voltages are used as input to test the proposed DNN model. After experimental verification, the proposed model effectively eliminates chromatic aberration while enhancing the spatial and temporal resolution of LC‐MLA. This novel network can be utilized to optimize the design process of LC‐MLA and significantly improve its imaging performance.

show abstract

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Cited by 4,977 publications

References 49 publications

Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification

Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification

Materials Research at Shanghai Jiao Tong University

Achromatic and Resolution Enhancement Light Field Deep Neural Network for ZnO Nematic Liquid Crystal Microlens Array

Contact Info

Product

Resources

About