2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00024
|View full text |Cite
|
Sign up to set email alerts
|

Remote Heart Rate Measurement From Highly Compressed Facial Videos: An End-to-End Deep Learning Solution With Video Enhancement

Abstract: Remote photoplethysmography (rPPG), which aims at measuring heart activities without any contact, has great potential in many applications (e.g., remote healthcare). Existing rPPG approaches rely on analyzing very fine details of facial videos, which are prone to be affected by video compression. Here we propose a two-stage, endto-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
204
1

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 229 publications
(206 citation statements)
references
References 35 publications
1
204
1
Order By: Relevance
“…PhysNet [11] is a 3D CNN which learns the temporal and spatial context features of face sequences simultaneously to extract the PPG signal and then measures the heart rate variability from the PPG signal. Yu [12] proposed a two-stage network that first enhances the quality of compressed videos and then retrieves the PPG signal from the enhanced videos. The main contribution of [12] is using CNN to enhance the compressed videos, which is not the focus of this paper.…”
mentioning
confidence: 99%
“…PhysNet [11] is a 3D CNN which learns the temporal and spatial context features of face sequences simultaneously to extract the PPG signal and then measures the heart rate variability from the PPG signal. Yu [12] proposed a two-stage network that first enhances the quality of compressed videos and then retrieves the PPG signal from the enhanced videos. The main contribution of [12] is using CNN to enhance the compressed videos, which is not the focus of this paper.…”
mentioning
confidence: 99%
“…Face spoof detection is similar to DeepFake detection, aiming to determine whether a video contains a live face. Since the remote heart rhythm (HR) measuring techniques achieve quite a bit of progress [2,6,35,39,54,55,69], many works use rPPG for face spoofing detection. For example, Li et al [40] use the pulse difference between real and printed faces to defend spoofing attacks.…”
Section: Remote Photoplethysmography (Rppg)mentioning
confidence: 99%
“…In this work, we present DeepRhythm, a novel DeepFake detection technique that is intuitively motivated and is designed from ground up with first principles in mind. Motivated by the fact that remote visual photoplethysmography (PPG) [69] is made possible by monitoring the minuscule periodic changes of skin color due to blood pumping through the face from a video, we conjecture that normal heartbeat rhythms found in the real face videos will be disrupted or even broken entirely in a DeepFake video, making it a powerful indicator for detecting DeepFakes. As shown in Figure 1, existing manipulations, e.g., DeepFakes, significantly change the sequential signals of the real video, which contains the primary information of the heartbeat rhythm.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, a growing number of physiological measurement techniques have been proposed based on remote photoplethysmography (rPPG) signals, which could be captured from face by ordinary cameras and without any contact. These techniques make it possible to measure HR, RF and HRV remotely and have been developed rapidly [15,16,3,9,19,22,13,25].…”
Section: Introductionmentioning
confidence: 99%
“…Besides the hand-crafted traditional methods, there are also some approaches which adopt the strong modeling ability of deep neural networks for remote physiological signal estimation [17,2,13,25]. Most of these methods focus on learning a network mapping from different hand-crafted representations of face videos (e.g., cropped video frames [17,25], motion representation [2] or spatial-temporal map [13]) to the physiological signals. One fact is that, these hand-crafted representations contain not only the information of physiological signals, but also the non-physiological information such as head movements, lighting variations and device noises.…”
Section: Introductionmentioning
confidence: 99%