At present, the video captioning models based on an encoder-decoder mainly rely on a single video input source. The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning, which is not conducive to the accurate description and understanding of video contents. To address this issue, this work proposes a novel video captioning method guided by a sentence retrieval generation network (ED-SRG). First, we integrate a ResNeXt network model, an efficient convolutional network for online video understanding (ECO) model and a long short-term memory (LSTM) network model to construct an encoder-decoder, which are utilized to extract the 2D features, 3D features and object features of video data respectively. These features are decoded to generate textual sentences that conform to video contents for sentence retrieval. Then, a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the textual sentences, and the candidate sentences are screened out through similarity measurement. Finally, a novel GPT-2 network model is constructed based on GPT-2 network structure. The model introduces a designed random selector to randomly select predicted words with a high probability of appearance in the corpus, which is used to guide and generate textual sentences that are more in line with human natural language expressions. The experiments on common datasets MSVD and MSR-VTT in comparison with some existing works demonstrate that our proposed method can generate sentences with richer semantics and the performance of our method is better than several state-of-the art approaches.
Image compression techniques realized in various ways have become an indispensable part in the practical storage and transmission of digital images. In this study, we present a novel method of lossy compression based on sampling and fuzzy encoding for grayscale images and discuss the problem of their reconstruction. First, an image is divided into a number of non-overlapping blocks of pixels. Next, we perform multiple rounds of random sampling. In each round, a number of pixels are selected as prototypes for the representing the corresponding block. Each pixel in the block is reconstructed based on the gray-levels of the prototypes and membership degrees computed with respect to the distances of each pixel to the prototypes. The reconstruction abilities delivered by the prototypes are quantified by a certain objective fidelity criterion and the prototypes leading to lowest reconstruction error are determined as representatives of current block. Finally, once the representatives in each block have been determined, we reconstruct the whole image based on these prototypes. Experimental studies as well as visual evaluations show that the proposed algorithm is able to achieve high compression ratios while preserving the overall fidelity in the decompressed images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.