Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectivelyinspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available inthe-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporalmemory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.
KEYWORDSvideo quality assessment; human visual system; content dependency; temporal-memory effects; in-the-wild videos ACM Reference Format:
Background: The molecular mechanisms of mechanical stress-induced cartilage thinning remain largely unknown. Results: Endoplasmic reticulum stress (ERS) was activated in chondrocytes during mechanical stress loading. ERS inhibition suppressed the apoptosis and restored the proliferation and cartilage thinning. Conclusion: ERS regulates mechanical stress-induced cartilage thinning. Significance: Our data demonstrate a novel pathological role for ERS and provide new insight into the treatment of temporomandibular joint diseases.
Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of datadriven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA. Keywords Content dependency • In-the-wild videos • Mixed datasets training • Temporal-memory effect • Video quality assessment Communicated by Mei Chen.
Recognition of surgical gesture is crucial for surgical skill assessment and efficient surgery training. Prior works on this task are based on either variant graphical models such as HMMs and CRFs, or deep learning models such as Recurrent Neural Networks and Temporal Convolutional Networks. Most of the current approaches usually suffer from over-segmentation and therefore low segment-level edit scores. In contrast, we present an essentially different methodology by modeling the task as a sequential decision-making process. An intelligent agent is trained using reinforcement learning with hierarchical features from a deep model. Temporal consistency is integrated into our action design and reward mechanism to reduce over-segmentation errors. Experiments on JIGSAWS dataset demonstrate that the proposed method performs better than state-of-the-art methods in terms of the edit score and on par in frame-wise accuracy. Our code will be released later.
Human saccade is a dynamic process of information pursuit. Based on the principle of information maximization, we propose a computational model to simulate human saccadic scanpaths on natural images. The model integrates three related factors as driven forces to guide eye movements sequentially -reference sensory responses, foveaperiphery resolution discrepancy, and visual working memory. For each eye movement, we compute three multi-band filter response maps as a coherent representation for the three factors. The three filter response maps are combined into multi-band residual filter response maps, on which we compute residual perceptual information (RPI) at each location. The RPI map is a dynamic saliency map varying along with eye movements. The next fixation is selected as the location with the maximal RPI value. On a natural image dataset, we compare the saccadic scanpaths generated by the proposed model and several other visual saliency-based models against human eye movement data. Experimental results demonstrate that the proposed model achieves the best prediction accuracy on both static fixation locations and dynamic scanpaths.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.