Abstract-We present a Bayesian model that allows to automatically generate fixations/foveations and that can be suitably exploited for compression purposes. The twofold aim of this work is to investigate how the exploitation of high-level perceptual cues provided by human faces occurring in the video can enhance the compression process without reducing the perceived quality of the video and to validate such assumption with an extensive and principled experimental protocol.To such end, the model integrates top-down and bottom-up cues to choose the fixation point on a video frame: at the highest level, a fixation is driven by prior information and by relevant objects, namely human faces, within the scene; at the same time, local saliency together with novel and abrupt visual events contribute by triggering lower level control. The performance of the resulting video compression system has been evaluated with respect to both the perceived quality of foveated video clips and the compression gain with an extensive evaluation campaign, which has eventually involved 200 subjects.Index Terms-Foveated video coding, foveation filtering, image coding, face detection, video quality measurement.
We propose a synchronization control scheme which achieves both speech/video Intra-Stream synchronizations and Inter-Stream synchronization for videoconferencing services over IP networks. The driving principle of our scheme is to guarantee the Intra-Sync speech timing relationships (hence the speech quality) and to adjust the video Intra-Sync and the Inter-Sync accordingly. Towards this aim we use a preventive control for the speech stream and a reactive control for the video stream. More precisely, we use an adaptive playout algorithm that keeps the Intra- Sync constraints within the talkspurts, while the network jitters are compensated by modifying only the silence period lengths on the basis of both speech and video packet delays. We implemented our scheme in a prototype, which allowed us to test the effectiveness of our solution. Actually, we could appreciate both perfect speech intelligibility and very satisfactory user perceived lip-sync. The latter is because the Inter-Sync error is concentrated only at the beginning of the talkspurts, where known experimental tests have shown that it is not detectable.
In [1] we introduced a new real-time variable frame rate control scheme. It is based on the video jerkiness and it can be applied to both coding and transcoding. This scheme constantly traces the motion of the incoming video and automatically tunes the outgoing frame rate according to the level of jerkiness acceptable by the user.This scheme has been conceived in the framework of mobile communications, which calls for an optimum use of both the available bandwidth and terminal resources.In this paper we present a subjective assessment of our solution carried out in a professional laboratory suitably equipped. Towards this aim a group of non-expert users was asked to express their preference when watching side by side the same video coded at variable frame rate and at fixed frame rate. Results show that most of the times a variable frame rate control based on a dynamic bit/frame allocation scheme may substantially improve the video quality perceived by the users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.