Visual quality and algorithm efficiency are two main interests in video frame interpolation. We propose a hybrid task-based convolutional neural network for fast and accurate frame interpolation of 4K videos. The proposed method synthesizes low-resolution frames, then reconstructs high-resolution frames in a coarse-to-fine fashion. We also propose edge loss, to preserve high-frequency information and make the synthesized frames look sharper. Experimental results show that the proposed method achieves state-of-the-art performance and performs 2.69x faster than the existing methods that are operable for 4K videos, while maintaining comparable visual and quantitative quality.Symmetry 2019, 11, 619 2 of 15 separable kernels for memory efficiency, but they still suffered from expensive computational costs. These methods not only have this problem, but also yield poor interpolation results for high-resolution video frames. The performance of these methods depends mainly on kernel size, and it is necessary to have larger kernels in order to produce good results for large motion. These methods have ghost or blur artifacts for high-resolution video frames since they tend to have larger motion. Niklaus et al. [15] proposed a context-aware synthesis approach that warps not only the input frames but also their pixel-wise contextual information, and uses them to interpolate a high-quality intermediate frame.However, this approach demands much more memory, since the pixel-wise contextual information has same resolution of input frames. Although the majority of video interpolation research [11][12][13][14][15][16][17] has focused on visual and quantitative quality, there are insufficient studies for handling high-resolution video. This is because these methods are memory intensive which is a major obstacle for interpolating high-resolution video frames.In this paper, we propose a novel hybrid task-based convolutional neural network for the fast and accurate frame interpolation of 4K videos. Our network is composed of a temporal interpolation (TI) network and a spatial interpolation (SI) network, which each have different objectives. The TI network interpolates intermediate frames, which are the same size as the downsampled input frames. The SI network reconstructs original-scale frames from the predicted intermediate frames, similar to super-resolution task [18][19][20][21]. The SI network exploits interpolation feature maps extracted from the TI network using our skip connection. To reduce the number of channels of the interpolation feature maps, we compress them into smaller dimensions, instead of concatenating as in other methods [11,14,22]. Thus, our SI network can remain shallow for good performance. This helps the network become less computational and shortens the inference time. We also propose edge loss to preserve high-frequency information and make the synthesized frames look sharper. The proposed network utilizes the YCbCr420 color format, which is commonly used for video coding as input and output, respectively. The...
Recently, video frame interpolation research developed with a convolutional neural network has shown remarkable results. However, these methods demand huge amounts of memory and run time for high-resolution videos, and are unable to process a 4K frame in a single pass. In this paper, we propose a fast 4K video frame interpolation method, based upon a multi-scale optical flow reconstruction scheme. The proposed method predicts low resolution bi-directional optical flow, and reconstructs it into high resolution. We also proposed consistency and multi-scale smoothness loss to enhance the quality of the predicted optical flow. Furthermore, we use adversarial loss to make the interpolated frame more seamless and natural. We demonstrated that the proposed method outperforms the existing state-of-the-art methods in quantitative evaluation, while it runs up to 4.39× faster than those methods for 4K videos.
In this paper, a new scheme to recognize a finger shape in the depth image captured by Kinect is proposed. Rigid transformation of an input finger shape is pre-processed for its robustness against the shape angle of input fingers. After extracting contour map from hand region, observing the change of contour pixel location is performed to calculate rotational compensation angle. For the finger shape recognition, we first acquire three pixel points, the most left, right, and top located pixel points. In the proposed algorithm, we first acquire three pixel points, the most left, right, and top located pixel points for the finger shape recognition, also we use geometrical features of human fingers such as Euclidean distance, the angle of the finger and the pixel area of hand region between each pixel points to recognize the finger shape. Through experimental results, we show that the proposed algorithm performs better than old schemes.키워드 : 인간-컴퓨터 상호작용, 손가락 모양 인식, 손 끝점 탐색, 상호 작용.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.