Foveated multipoint videoconferencing at low bit rates

Sheikh, H.R.; Liu, Shizhong; Wang, Zhou; Bovik, Alan C.

doi:10.1109/icassp.2002.5745041

Cited by 4 publications

(5 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the first of these, a foveated retinal sampling geometry is used to either apply a foveating coordinate transformation on an original uniform resolution image [33], or to average and map local pixel groups into superpixels [34], [35]. Filter-based methods process images with space-varying low-pass filter with cut-off frequencies determined by foveated resolution-reduction protocols [36], [37]. Multiresolution methods foveation involves decomposing images into bandpass scales, and only retaining scales specified by a foveal fall-off function defined relative to a measured or presumed fixation point [4], [38].…”

Section: A Foveated Video Compressionmentioning

confidence: 99%

Foveation-based Deep Video Compression without Motion Search

Chen¹,

Webb²,

Bovik³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Virtual Reality (VR) and its applications have attracted significant and increasing attention. However, the requirements of much larger file sizes, different storage formats, and immersive viewing conditions pose significant challenges to the goals of acquiring, transmitting, compressing, and displaying high-quality VR content. At the same time, the great potential of deep learning to advance progress on the video compression problem has driven a significant research effort. Because of the high bandwidth requirements of VR, there has also been significant interest in the use of space-variant, foveated compression protocols. We have integrated these techniques to create an endto-end deep learning video compression framework. A feature of our new compression model is that it dispenses with the need for expensive search-based motion prediction computations. This is accomplished by exploiting statistical regularities inherent in video motion expressed by displaced frame differences. Foveation protocols are desirable since, unlike traditional flat-panel displays, only a small portion of a video viewed in VR may be visible as a user gazes in any given direction. Moreover, even within a current field of view (FOV), the resolution of retinal neurons rapidly decreases with distance (eccentricity) from the projected point of gaze. In our learning based approach, we implement foveation by introducing a Foveation Generator Unit (FGU) that generates foveation masks which direct the allocation of bits, significantly increasing compression efficiency while making it possible to retain an impression of little to no additional visual loss given an appropriate viewing geometry. Our experiment results reveal that our new compression model, which we call the Foveated MOtionless VIdeo Codec (Foveated MOVI-Codec), is able to efficiently compress videos without computing motion, while outperforming foveated version of both H.264 and H.265 on the widely used UVG dataset and on the HEVC Standard Class B Test Sequences. The Foveated MOVI-Codec project page can be found at https://github.com/Meixu-Chen/Foveated-MOVI-Codec.

show abstract

Section: A Foveated Video Compressionmentioning

confidence: 99%

Foveation-based Deep Video Compression without Motion Search

Chen¹,

Webb²,

Bovik³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, identifying a dominant speaker requires periodical analysis of conversational patterns from different clients and the ensuing unequal rate control. The rate control typically applies a form of foveating such that the visual clarity of a dominant speaker will appear sharper, relative to the nondominant speakers [29]. Research into unequal rate control for a dominant speaker applied dynamic bit allocation and dynamic region of interest transcoding [31,32,19,11].…”

Section: Related Workmentioning

confidence: 99%

“…29), Y represents a set of speech durations of a loudest speaker during a video communication session, with y i ∈ Y. For instance, inFig.…”

mentioning

confidence: 99%

“…However, the length of a MVC session is not fixed with durations ranging anywhere between a few minutes to several hours. Therefore in(29), µ s represents a sample mean speech length and there is no measure of difference between µ s and the population mean speech length, µ p . As such, a 95% confidence interval for the mean is computed as an observed interval, which acts as a good estimate to the unknown µ p .The lower endpoint, λ of a confidence interval is computed asλ =X − z × σ s |Y|(30)whereX = µ s , σ s represents the standard deviation of the sampled Y speech lengths and z represents the critical value obtained from the Standard Normal table.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Dominant speaker detection in multipoint video communication using Markov chain with non-linear weights and dynamic transition window

Baskaran

Chang

Loo

et al. 2018

Information Sciences

View full text Add to dashboard Cite

This paper proposes an enhanced discrete-time Markov chain algorithm in predicting dominant speaker(s) for multipoint video communication system in the presence of transient speech. The proposed algorithm exploits statistical properties of the past speech patterns to accurately predict the dominant speaker for the next time state. Non-linear weights-based coefficients are employed in the enhanced Markov chain for both the initial state vector and transition probability matrix. These weights significantly improve the time taken to predict a new dominant speaker during a conference session. In addition, a mechanism to dynamically modify the size of the transition probability matrix window/container is introduced to improve the adaptability of the Markov chain towards the variability of speech characteristics. Simulation results indicate that for an 11 conference participants test scenario, the enhanced Markov chain prediction algorithm registered an 85% accuracy in predicting a dominant speaker when compared to an ideal case where there is no transient speech. Misclassification of dominant speakers due to transient speech was also reduced by 87%.

show abstract

“…However, in some applications, users focus more on some regions of images and expect better quality in those regions. For example, in a videoconference environment, more user attention is paid to the face region of the speaker than other regions [28]. Yet another example is in the remote education applications, where students focus mostly on the teacher or some specific region of the blackboard or the lecture slide.…”

Section: Foveation-based Rate Shapingmentioning

confidence: 99%

A practical foveation-based rate-shaping mechanism for MPEG videos

Ho¹,

Cheng³

2005

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Foveation is one of the nonuniform resolution properties of the human visual system. Recently, different foveation models are proposed and utilized for image and video coding, for the sake of bit-rate saving with no or minor perceptual quality distortion. In the first part of this paper, we propose an efficient and practical DCT-domain foveation model, which is deduced from existing experimental results. In the second part, we present a foveation-based rate-shaping mechanism for MPEG bitstreams, as an application example of the proposed foveation model. The rate shaper is based on eliminating DCT coefficients embedded in MPEG bitstreams. An efficient rate-shaping mechanism is developed to meet various bit-rate requirements. Our simulation confirmed that the proposed foveation model and the rate-shaping mechanism are practical for real-world usage.

show abstract

Foveated multipoint videoconferencing at low bit rates

Cited by 4 publications

References 7 publications

Foveation-based Deep Video Compression without Motion Search

Foveation-based Deep Video Compression without Motion Search

Dominant speaker detection in multipoint video communication using Markov chain with non-linear weights and dynamic transition window

A practical foveation-based rate-shaping mechanism for MPEG videos

Contact Info

Product

Resources

About