DeepFovea

Kaplanyan, Anton; Sochenov, Anton; Leimkühler, Thomas; Okunev, Mikhail; Goodall, Todd; Rufo, Gizem

doi:10.1145/3355089.3356557

Cited by 88 publications

(10 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their experiments reveal the effectiveness of the cloud VR system with foveated rendering. Authors in [20] conducted experimental evaluations of DeepFovea, an AI-assisted foveated rendering process and achieved more than 14 times compression on RGB video with no significant degradation in user-perceived quality. Their results indicate that deep learning (DL) based foveation would significantly reduce the rendering load.…”

Section: Cloud Vrmentioning

confidence: 99%

“…Its 6G vision and technology trends study are expected to be completed by 2023. The ITU-T focus group technologies for network 2030 (FG NET-2030) was established by ITU-T Study Group 13 at its meeting in Geneva, [16][17][18][19][20][21][22][23][24][25][26][27] July 2018. It intends to study the capabilities of networks for the year 2030 and beyond, when it is expected to support novel forward-looking scenarios, such as holographic type communications, extremely fast response in critical situations and high-precision communication demands of emerging market verticals.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts

You

Wang

Huang

et al. 2020

Sci. China Inf. Sci.

1,216

495

View full text Add to dashboard Cite

The fifth generation (5G) wireless communication networks are being deployed worldwide from 2020 and more capabilities are in the process of being standardized, such as mass connectivity, ultra-reliability, and guaranteed low latency. However, 5G will not meet all requirements of the future in 2030 and beyond, and sixth generation (6G) wireless communication networks are expected to provide global coverage, enhanced spectral/energy/cost efficiency, better intelligence level and security, etc. To meet these requirements, 6G networks will rely on new enabling technologies, i.e., air interface and transmission technologies and novel network architecture, such as waveform design, multiple access, channel coding schemes, multi-antenna technologies, network slicing, cell-free architecture, and cloud/fog/edge computing. Our vision on 6G is that it will have four new paradigm shifts. First, to satisfy the requirement of global coverage, 6G will not be limited to terrestrial communication networks, which will need to be complemented with non-terrestrial networks such as satellite and unmanned aerial vehicle (UAV) communication networks, thus achieving a space-air-ground-sea integrated communication network. Second, all spectra will be fully explored to further increase data rates and connection density, including the sub-6 GHz, millimeter wave (mmWave), terahertz (THz), and optical frequency bands. Third, facing the big datasets generated by the use of extremely heterogeneous networks, diverse communication scenarios, large numbers of antennas, wide bandwidths, and new service requirements, 6G networks will enable a new range of smart applications with the aid of artificial intelligence (AI) and big data technologies. Fourth, network security will have to be strengthened when developing 6G networks. This article provides a comprehensive survey of recent advances and future trends in these four aspects. Clearly, 6G with additional technical requirements beyond those of 5G will enable faster and further communications to the extent that the boundary between physical and cyber worlds disappears.

show abstract

Section: Cloud Vrmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts

You

Wang

Huang

et al. 2020

Sci. China Inf. Sci.

1,216

495

View full text Add to dashboard Cite

show abstract

“…Using eye-tracking devices, however, introduces an additional processing delay to the system and can only work with compatible client devices. Kaplanyan et al [23] proposed a neural network-based codec for 3D and AR contents. This work also assumes the availability of an eye gaze tracker at client.…”

Section: Related Workmentioning

confidence: 99%

“…DeepGame takes a learning-based approach to understand the player contextual interest within the game, predict the regions of interest (ROIs) across frames, and allocate bits to different regions based on their importance. Unlike prior works (e.g., [17,20,23]), DeepGame does not need additional feedback from players nor does it modify the existing encoders. Thus, DeepGame is easier to deploy in practice.…”

Section: Introductionmentioning

confidence: 99%

DeepGame

Mossad

Diab

Amer

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Cloud gaming enables users to play games on virtually any device. This is achieved by offloading the game rendering and encoding to cloud datacenters. As game resolutions and frame rates increase, cloud gaming platforms face a major challenge to stream high quality games due to the high bandwidth and low latency requirements. In this paper, we propose a new video encoding pipeline, called DeepGame, for cloud gaming platforms to reduce the bandwidth requirements with limited to no impact on the player quality of experience. DeepGame learns the player's contextual interest in the game and the temporal correlation of that interest using a spatiotemporal deep neural network. Then, it encodes various areas in the video frames with different quality levels proportional to their contextual importance. DeepGame does not change the source code of the video encoder or the video game, and it does not require any additional hardware or software at the client side. We implemented DeepGame in an open-source cloud gaming platform and evaluated its performance using multiple popular games. We also conducted a subjective study with real players to demonstrate the potential gains achieved by DeepGame and its practicality. Our results show that DeepGame can reduce the bandwidth requirements by up to 36% compared to the baseline encoder, while maintaining the same level of perceived quality for players and running in real time. CCS CONCEPTS• Applied computing → Computer games; • Computing methodologies → Interest point and salient region detections; • Information systems → Multimedia streaming.

show abstract

“…Focal computer vision relies on a nonhomogeneous compression of an image that maintains the pixel information at the center of fixation and strongly compresses it at the periphery, including pyramidal encoding (Kortum & Geisler, 1996;Butko & Movellan, 2010), local wavelet decomposition (Daucé, 2018) and log-polar encoding (Traver & Bernardino, 2010). A recent deep-learning-based implementation of such compression shows that in a video flow, a log-polar sampling of the image is sufficient to provide a reconstruction of the whole image (Kaplanyan et al, 2019). However, this particular algorithm lacks a system predicting the best saccadic action to perform.…”

Section: State Of the Artmentioning

confidence: 99%

A dual foveal-peripheral visual processing model implements efficient saccade selection

2020

View full text Add to dashboard Cite

We develop a visuomotor model that implements visual search as a focal accuracy-seeking policy, with the target's position and category drawn independently from a common generative process. Consistently with the anatomical separation between the ventral versus dorsal pathways, the model is composed of two pathways that respectively infer what to see and where to look. The "What" network is a classical deep learning classifier that only processes a small region around the center of fixation, providing a "foveal" accuracy. In contrast, the "Where" network processes the full visual field in a biomimetic fashion, using a log-polar retinotopic encoding, which is preserved up to the action selection level. In our model, the foveal accuracy is used as a monitoring signal to train the "Where" network, much like in the "actor/critic" framework. After training, the "Where" network provides an "accuracy map" that serves to guide the eye toward peripheral objects. Finally, the comparison of both networks' accuracies amounts to either selecting a saccade or keeping the eye focused at the center to identify the target. We test this setup on a simple task of finding a digit in a large, cluttered image. Our simulation results demonstrate the effectiveness of this approach, increasing by one order of magnitude the radius of the visual field toward which the agent can detect and recognize a target, either through a single saccade or with multiple ones. Importantly, our log-polar treatment of the visual information exploits the strong compression rate performed at the sensory level, providing ways to implement visual search in a sublinear fashion, in contrast with mainstream computer vision.

show abstract

DeepFovea

Cited by 88 publications

References 44 publications

Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts

Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts

DeepGame

A dual foveal-peripheral visual processing model implements efficient saccade selection

Contact Info

Product

Resources

About