Just-noticeable distortion (JND), which refers to the maximum distortion that the human visual system (HVS) cannot perceive, plays an important role in perceptual image and video processing. In comparison with JND estimation for images, estimation of the JND profile for video needs to take into account the temporal HVS properties in addition to the spatial properties. In this paper, we develop a spatio-temporal model estimating JND in the discrete cosine tranform domain. The proposed model incorporates the spatio-temporal contrast sensitivity function, the influence of eye movements, luminance adaptation, and contrast masking to be more consistent with human perception. It is capable of yielding JNDs for both still images and video with significant motion. The experiments conducted in this study have demonstrated that the JND values estimated for video sequences with moving objects by the model are in line with the HVS perception. The accurate JND estimation of the video towards the actual visibility bounds can be translated into resource savings (e.g., for bandwidth/storage or computation) and performance improvement in video coding and other visual processing tasks (such as perceptual quality evaluation, visual signal restoration/enhancement, watermarking, authentication, and error protection).Index Terms-Contrast sensitivity function (CSF), discrete cosine transform (DCT), eye movement, just-noticeable distortion (JND).
Abstract-Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene, and every fixation point lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an identification marker on the object, we propose a method to segment the object of interest by finding the "optimal" closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space. The proposed segmentation process is carried out in two separate steps: First, all visual cues are combined to generate the probabilistic boundary edge map of the scene; second, in this edge map, the "optimal" closed contour around a given fixation point is found. Having two separate steps also makes it possible to establish a simple feedback between the mid-level cue (regions) and the low-level visual cues (edges). In fact, we propose a segmentation refinement process based on such a feedback process. Finally, our experiments show the promise of the proposed method as an automatic segmentation framework for a general purpose visual system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.