Video steganography plays an important role in secret communication that conceals a secret video in a cover video by perturbing the value of pixels in the cover frames. Imperceptibility is the first and foremost requirement of any steganographic approach. Inspired by the fact that human eyes perceive pixel perturbation differently in different video areas, a novel effective and efficient Deeply‐Recursive Attention Network (DRANet) for video steganography to find suitable areas for information hiding via modelling spatio‐temporal attention is proposed. The DRANet mainly contains two important components, a Non‐Local Self‐Attention (NLSA) block and a Non‐Local Co‐Attention (NLCA) block. Specifically, the NLSA block can select the cover frame areas which are suitable for hiding by computing the correlations among inter‐ and intra‐cover frames. The NLCA block aims to effectively produce the enhanced representations of the secret frames to enhance the robustness of the model and alleviate the influence of different areas in the secret video. Furthermore, the DRANet reduces the model parameters by performing similar operations on the different frames within an input video recursively. Experimental results show the proposed DRANet achieves better performance with fewer parameters than the state‐of‐the‐art competitors.
In this paper, we propose a simple yet effective transformer framework for self-supervised learning called DenseDINO to learn dense visual representations. To exploit the spatial information that the dense prediction tasks require but neglected by the existing self-supervised transformers, we introduce point-level supervision across views in a novel token-based way. Specifically, DenseDINO introduces some extra input tokens called reference tokens to match the point-level features with the position prior. With the reference token, the model could maintain spatial consistency and deal with multi-object complex scene images, thus generalizing better on dense prediction tasks. Compared with the vanilla DINO, our approach obtains competitive performance when evaluated on classification in ImageNet and achieves a large margin (+7.2% mIoU) improvement in semantic segmentation on PascalVOC under the linear probing protocol for segmentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.