The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.
Readily available RGB-D cameras in smart phones and improving 3D scanning technologies have made it possible to produce detailed point cloud and point-based models of real world objects even in real time. Rendering such models in high quality and at satisfactory frame rates is needed for realistic extended reality (XR) applications. This publication reviews real-time photorealistic point cloud rendering methods which directly ray trace or rasterize point cloud models, with an emphasis on ray tracing and realtime performance. We found that real-time direct point cloud ray tracing research has been focused on static non-animated content, and thus, open research possibilities include adapting modern dedicated ray tracing hardware for increased performance for animated and live captured scenes, and adding path tracing techniques to increase photorealistic effects in the rendering result. A categorization and discussion on the capabilities of state-of-the-art photorealistic point cloud rendering methods is presented by surveying both real-time and offline methods, which are assumed to become real-time capable with the advances in near-future hardware. Challenges and future trends are derived by comparing different rasterization and ray tracing methods as well as acceleration structures for point clouds in terms of produced rendering effects and speed.
Latency-critical computer vision systems, such as autonomous driving or drone control, require fast image or video compression when offloading neural network inference to a remote computer. To ensure low latency on a near-sensor edge device, we propose the use of lightweight encoders with constant bitrate and pruned encoding configurations, namely, ASTC and JPEG XS. Pruning introduces significant distortion which we show can be recovered by retraining the neural network with compressed data after decompression. Such an approach does not modify the network architecture or require coding format modifications. By retraining with compressed datasets, we reduced the classification accuracy and segmentation mean intersection over union (mIoU) degradation due to ASTC compression to 4.9-5.0 percentage points (pp) and 4.4-4.0 pp, respectively. With the same method, the mIoU lost due to JPEG XS compression at the main profile was restored to 2.7-2.3 pp. In terms of encoding speed, our ASTC encoder implementation is 2.3x faster than JPEG. Even though the JPEG XS reference encoder requires optimizations to reach low latency, we showed that disabling significance flag coding saves 22-23% of encoding time at the cost of 0.4-0.3 mIoU after retraining.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.