In this paper, an intra prediction method is proposed for coding of depth pictures using plane modelling. Each pixel in the depth picture is related to the distance from a camera to an object surface, and pixels corresponding to a flat surface of an object form a relationship with the 2D plane surface. The plane surface can be represented by a simple equation in the 3D camera coordinate system in such a way that the coordinate system of depth pixels can be transformed to the camera coordinate system. This paper finds the parameters which define the plane surface closest to given depth pixels. The plane model is then used to predict the depth pixels on the plane surface. A depth prediction method is also devised for efficient intra prediction of depth pictures, using variable-size blocks. For prediction with variable-size blocks, the plane surface that occupies a large part of the picture can be predicted using a large block size. The simulation results of the proposed method show that the mean squared error is reduced by up to 96.6% for a block size of 4 × 4 pixels and reduced by up to 98% for a block size of 16 × 16, compared with the intra prediction modes of H.264/AVC and H.265/HEVC.
This paper proposes a method to classify food types and to estimate meal intake amounts in pre- and post-meal images through a deep learning object detection network. The food types and the food regions are detected through Mask R-CNN. In order to make both pre- and post-meal images to a same capturing environment, the post-meal image is corrected through a homography transformation based on the meal plate regions in both images. The 3D shape of the food is determined as one of a spherical cap, a cone, and a cuboid depending on the food type. The meal intake amount is estimated as food volume differences between the pre-meal and post-meal images. As results of the simulation, the food classification accuracy and the food region detection accuracy are up to 97.57% and 93.6%, respectively.
In this paper, two methods of zoom motion estimation for color and depth videos by using depth information are proposed. Color and depth videos are independently estimated for zoom motion. Zoom for color video is scaled by spatial domain, and depth video is scaled by both spatial and depth domains. For color video, instead of existing methods of zoom motion estimation that apply all of possible zoom ratios for a current block, the zoom ratio of the proposed method is determined as the ratio of the average depth values of the current and reference blocks. Then, the reference block is resized by multiplying the zoom ratio and the reference block is mapped to the current block. For depth video, the reference block is first scaled in the spatial direction by the same methodology used for color video and then scaled by a distance ratio from a camera to the objects. Compared to the conventional motion estimation method, the proposed method reduces MSE by up to about 30% for the color video and up to about 85% for the depth video.
In this paper, we propose a passage guidance method for the visually impaired person through a braille block detection. The proposed method recognizes the passage information by detecting the braille blocks individually through a neural network based on a captured image. The braille blocks are detected through YOLOv7 which is a state-of-the-art object detection network. Then, the placements of the detected braille blocks are analyzed to find the groups of the straight blocks arranged in a single line and of the dot blocks gathered in a square shape. The passage information is recognized by comparing between the analyzed block placement and predefined placements. Objects in a sidewalk are detected together with the braille block to warn obstacles on the sidewalk. The passage information is guided in voice for the visually impaired person. In simulation results, the proposed method recognized the passage information with about 85% accuracy.
In this paper, we propose an intra-picture prediction method for depth video by a block clustering through a neural network. The proposed method solves a problem that the block that has two or more clusters drops the prediction performance of the intra prediction for depth video. The proposed neural network consists of both a spatial feature prediction network and a clustering network. The spatial feature prediction network utilizes spatial features in vertical and horizontal directions. The network contains a 1D CNN layer and a fully connected layer. The 1D CNN layer extracts the spatial features for a vertical direction and a horizontal direction from a top block and a left block of the reference pixels, respectively. 1D CNN is designed to handle time-series data, but it can also be applied to find the spatial features by regarding a pixel order in a certain direction as a timestamp. The fully connected layer predicts the spatial features of the block to be coded through the extracted features. The clustering network finds clusters from the spatial features which are the outputs of the spatial feature prediction network. The network consists of 4 CNN layers. The first 3 CNN layers combine two spatial features in the vertical and horizontal directions. The last layer outputs the probabilities that pixels belong to the clusters. The pixels of the block are predicted by the representative values of the clusters that are the average of the reference pixels belonging to the clusters. For the intra prediction for various block sizes, the block is scaled to the size of the network input. The prediction result through the proposed network is scaled back to the original size. In network training, the mean square error is used as a loss function between the original block and the predicted block. A penalty for output values far from both ends is introduced to the loss function for clear network clustering. In the simulation results, the bit rate is saved by up to 12.45% under the same distortion condition compared with the latest video coding standard.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.