ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414731
|View full text |Cite
|
Sign up to set email alerts
|

Ultra-Low Bitrate Video Conferencing Using Deep Image Animation

Abstract: In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side. The overall system is trained in an end-to-end fashion minimizing a reconstruction error on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 27 publications
0
14
0
Order By: Relevance
“…Applications to low bandwidth video chat. Last year simultaneously appeared three generative approaches with applications to low bandwidth video chat: [8,11,17]. The transfer of Wang et al's approach discussed above [17] to low compute regime has not been studied so far, but the dimensionality of the latent space employed in their work is height times higher than the one of FOM, and 32 times higher than the one of [11].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Applications to low bandwidth video chat. Last year simultaneously appeared three generative approaches with applications to low bandwidth video chat: [8,11,17]. The transfer of Wang et al's approach discussed above [17] to low compute regime has not been studied so far, but the dimensionality of the latent space employed in their work is height times higher than the one of FOM, and 32 times higher than the one of [11].…”
Section: Related Workmentioning
confidence: 99%
“…A server based solution is evoked in [1], where the reconstructed video of a sender with poor network connection could be sent to a receiver to only enable one way communication: The low bandwidth context would prevent the sender to receive their interlocutor's video. The work of Konuko et al [8] is fully orthogonal to ours, as it introduces an algorithm to select intra frames to best compress videos using the original FOM algorithm. Finally, [11] presents a variant of FOM that runs on mobile, where the decoder contains SPADE layers to refine some areas of the face.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…While research on general neural video compression already features a rich body of literature (e.g. [21,8,3,13,9]), there is only a handful of works on neural face video compression [11,6,17]. Oquab et al [11] study the suitability of different talking head synthesis approaches for compression, targeting a mobile low-resource scenario.…”
Section: Related Workmentioning
confidence: 99%
“…Designing and training compression models specific to video calls is one of the most recent breakthrough stories along these lines [17,11,6], with some works reporting an order of magnitude of bit rate reduction at a given perceptual quality compared to engineered codecs [17]. In a nutshell, these face video compression algorithms rely on a source frame (view) of the face, warp this view to approximate the target frame to be transmitted, and process the warped source view (or features extracted from the source view) with a generator to compensate for imperfections in the warped source view.…”
Section: Introductionmentioning
confidence: 99%