This paper proposes a change detection algorithm on multi-spectral images based on feature-level U-Net. A low-complexity pan-sharpening method is proposed to employ not only panchromatic images, but also multi-spectral images for enhancing the performance of the deep neural network. The high-resolution multi-spectral (HRMS) images are then fed into the proposed feature-level U-Net. The proposed feature-level U-Net consists of two-stages: a feature-level subtracting network and U-Net. The feature-level subtracting network is used to extract dynamic difference images (DI) for the use of low-level and high-level features. By employing this network, the performance of change detection algorithms can be improved with a smaller number of layers for U-Net with a low computational complexity. Furthermore, the proposed algorithm detects small changes by taking benefits of both geometrical and spectral resolution enhancement and adopting an intensity-hue-saturation (IHS) pan-sharpening method. A modified of IHS pan-sharpening algorithm is introduced to solve spectral distortion problem by applying mean filtering in high frequency. We found that the proposed change detection on HRMS images gives a better performance compared to existing change detection algorithms by achieving an average F-1 score of 0.62, a percentage correct classification (PCC) of 98.78%, and a kappa of 61.60 for test datasets. INDEX TERMS Convolutional neural network, deep learning, remote sensing, satellite images, change detection.
This paper proposes an optimal rate control model based on deep neural network (DNN) features to improve the coding tree unit (CTU)-level rate control in high-efficiency video coding for conversational videos. The proposed algorithm extracts high-level features from the original and previously reconstructed CTU blocks based on a predefined DNN model of the visual geometry group (VGG-16) network. Then, the correlation of the high-level feature and quantization parameter (QP) values of previously coded CTUs is explored for subjective visual characteristics to estimate the CTU-level rate control model parameters (alpha and beta) and the bit allocation of each CTU. Therefore, this paper also proposes a new model for Lambda estimation for each CTU by improving its relationship with the estimated bits per pixel to control the rate and relative distortion. Furthermore, the Lambda and QP boundary settings were adjusted based on the proposed perceptual model to ensure the rate control accuracy of each CTU. The results of experiments with the proposed algorithm, when compared to the rate control model in HM-16.20, reveal higher bitrate accuracy and an average BD-rate gain based on PSNR, SSIM, and MSSSIM metrics using the low-delay-P configuration. INDEX TERMS Deep neural network, high efficiency video coding (HEVC), rate control, video coding
This work proposes a rate control model based on deep convolutional features to improve the video coding performance of the HEVC encoders under the random access (RA) configuration. The proposed algorithm extracts high-level features from the original and previous coded frames using a pretrained visual geometry group (VGG-16) model by considering characteristics of a different temporal layer for the RA configuration. Subsequently, R-λ parameters (alpha and beta), bit allocation, λ estimation, and quantization parameter decision at frame-level are formulated by utilizing the extracted high-level features to maintain video quality and bitrate accuracy control. In addition, bit allocation at the group-of-picture (GOP)-level rate control is proposed with perceptual-based thresholding to control smooth bitrates and visual quality between adjacent GOPs. The results verify that the proposed algorithm is efficient in coding performance and bit accuracy by keeping visual quality. Compared with the existing R-λ rate model in HM-16.20, the proposed models can achieve an average BD-rate gain of -4.39% and -8.74% in PSNR and MSSSIM metrics for the RA configuration, respectively.INDEX TERMS Deep neural network, high efficiency video coding (HEVC), perceptual video coding, rate control, video coding
This study proposes a bilateral attention U-Net with a dissimilarity attention gate (DAG) for change detection on remote sensing imageries. The proposed network is designed with a bilateral dissimilarity encoding for the DAG calculation to handle reversible input images, resulting in high detection rates regardless of the order of the two input images for change detection. The DAG exploits all the combinations of joint features to avoid spectral information loss fed into an attention gate on the decoder side. The effectiveness of the proposed method was evaluated on the KOMPSAT-3 satellite images dataset and the aerial change detection dataset (CDD). Its performance was better than that of conventional methods (specifically, U-Net, ATTUNet, and Modified-UNet++) as it achieved average F1-score and kappa coefficient (KC) values of 0.68 and 66.93, respectively, for the KOMPSAT-3 dataset. For CDD, it achieved F1-score and KC values of 0.70 and 68.74, respectively, which are also better values than those achieved by conventional methods. In addition, we found that the proposed bilateral attention U-Net can provide the same changed map regardless of whether the image order is reversed.
In this paper, we propose a partial decoding method with limited memory usage for high-speed thumbnail extraction. The proposed method performs a partial inverse transform and a partial intra prediction in order to reconstruct pixels for intra prediction and thumbnails. Thereafter, the reconstructed pixels at the bottom and right line of the block are stored in the line buffer and the thumbnail buffer without being stored in the decoded picture buffer with full resolution. H.264/AVC, HEVC and VP9 video codecs have different coding structures, prediction and transforms; however, the proposed algorithm can be applied to the corresponding codecs in the same manner. In order to evaluate the performance of the proposed method, we implemented the proposed algorithm for H.264/AVC, HEVC and VP9. We found that the thumbnail extraction time of the proposed method decreased by 66% in H.264/AVC, 52% in HEVC and 48% in VP9 as compared to the full decoding method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.