Recently, emotion recognition in conversation (ERC) has become more crucial in the development of diverse internet of things devices, especially closely connected with users. The majority of deep learning-based methods for ERC combine the multilayer, bidirectional, recurrent feature extractor and the attention module to extract sequential features. In addition to this, the latest model utilizes speaker information and the relationship between utterances through the graph network. However, before the input is fed into the bidirectional recurrent module, detailed intrautterance features should be obtained without variation of characteristics. In this article, we propose a residual-based graph convolution network (RGCN) and a new loss function. Our RGCN contains the residual network (ResNet)-based, intrautterance feature extractor and the GCN-based, interutterance feature extractor to fully exploit the intra-inter informative features. In the intrautterance feature extractor based on ResNet, the elaborate context feature for each independent utterance can be produced. Then, the condensed feature can be obtained through an additional GCN-based, interutterance feature extractor with the neighboring associated features for a conversation. The proposed loss function reflects the edge weight to improve effectiveness. Experimental results demonstrate that the proposed method achieves superior performance compared with state-of-the-art methods.
High efficiency video coding (HEVC) has been developed rapidly to support new generation display devices and their ultra high definition (UHD) with high dynamic range (HDR) and wide color gamut (WCG). To support HDR/WCG sequences on the HEVC standard, pre-/post-processing technique has been designed. After an HDR video is compressed, a reconstructed frame exhibits chromatic distortions that resemble color smearing. To remove this color artifact, we herein propose a block-level quantization parameter (QP) offset-control-based efficient compression algorithm for the HDR sequence. First, we extract the candidate coding units (CUs) with the annoying area to the human eye based on the just noticeable distortion (JND) model. Subsequently, the chromatic distorted blocks are verified by the activity function as the chromatic artifact is observed at the nearby strong edge. For the verified artifact blocks, we reassign the QPs for the Cb and Cr chroma components. Our experimental results show that the proposed method yields an average gain in BD-rate of 3.3% for U, and 3.4% for V with a negligible bitrate increase of 0.3% on average.
In video coding, inter bi-prediction contributes to improve the coding efficiency significantly by producing precise fused prediction block. Although block-wise bi-prediction methods, such as bi-prediction with CU-level weight (BCW), are applied in Versatile Video Coding (VVC), linear fusion-based strategy is still difficult to represent diverse pixel variations inside a block. In addition, a pixel-wise bi-prediction method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, non-linear optical flow equation in BDOF mode is applied under assumptions so this method is yet unable to accurately compensate various kind of bi-prediction block. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing attention mechanism. Furthermore, the knowledge distillation (KD)-based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89\% and 4.91\% on Y component under random access (RA) and low delay B (LDB), respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.