As an interesting and challenging problem, generating image caption automatically has attracted increasingly attention in natural language processing and computer vision communities. In this paper, we propose an end-to-end deep learning approach for image caption generation. We leverage image feature information at specific location every moment and generate the corresponding caption description through a semantic attention model. The end-to-end framework allows us to introduce an independent recurrent structure as an attention module, derived by calculating the similarity between image feature sequence and semantic word sequence. Additionally, our model is designed to transfer the knowledge representation obtained from the English portion into the Chinese portion to achieve the cross-lingual image captioning. We evaluate the proposed model on the most popular benchmark datasets. We report an improvement of 3.9% over existing state-of-the-art approaches for cross-lingual image captioning on the Flickr8k CN dataset on CIDEr metric. The experimental results demonstrate the effectiveness of our attention model.
The 3D extension of High Efficiency Video Coding (3D-HEVC) introduce Depth Modeling Mode (DMM) and 35 conventional intra modes to enhance the quality of coding, while bringing unacceptable computational complexity during the process of rough mode decision (RMD) and most probable mode (MPM). In this paper, we proposed a fast mode decision algorithm for texture map and depth map coding based on gradient information. Firstly, analyzing the characteristics of predict units (PU) in different intra mode applications that obtain the lowest RD-cost,then extracting the gradient information of PU to classify the PU into three types of gradient blocks and selecting appropriate candidate modes for the PU,thereby avoiding search each mode in coding process and skip the process of RMD and MPM early. Experimental results show that the proposed algorithm can achieve average 30.6% time saving with negligible reduction of coding performance.
Opinion-leader mining in social networks is a critical problem in research of the information dissemination process and in public opinion guidance and supervision. Not every social network user has a high probability to be an opinion leader. However, most mining methods identify opinion leaders among users in the whole network, which adds unnecessary calculations. To solve this problem, we propose a rank after clustering (RaC) algorithm to mine opinion leaders in social networks with a phased-clustering perspective, which has the following aspects: (1) Aiming to reduce the scale of calculation, the clustering stage clusters users in social networks using a K-means algorithm according to topological information to find the set of opinion leader candidates; (2) The ranking stage determines the user ranks of opinion leader candidates by both their activeness and influence, and we accumulate the followers' influence weighted by degree of attention when assessing user influence. In experiments, a new indicator, the C-value, and simulations based on the linear threshold model are used to evaluate the performance of the RaC algorithm. The results show that RaC is effective and accurate.
At present, the rate control algorithm for multiview high-efficiency video coding (MV-HEVC) does not have the capability of efficient coding tree unit(CTU) layer bit allocation, and the video quality varies greatly for sequences with sudden scene changes or large motions. To overcome this limitation, this paper proposes a rate control algorithm for MV-HEVC based on scene detection. Firstly, we established ρ domain rate control model based on multi-objective optimization. Then, it uses image similarity to make reasonable bit allocation among viewpoints. If the video scene is switched, the image similarity is recalculated, and then the correlation between the weights of the interview point rates and the correlation between the viewpoints are analyzed. Finally, the frame layer rate control considers the layer B-frame and other factors in allocating the code rate, and the basic unit layer rate control adopts different quantization methods according to the content complexity of the CTU. Experimental results show that the proposed rate control algorithm can maintain good coding efficiency and decrease the average video quality variation by 25.29%.
Most existing rate control algorithms are based on the rate-quantization (R-Q) model. However, with video coding schemes becoming more flexible, it is very difficult to accurately model the R-Q relationship. Therefore, in this study we propose a novel ρ domain rate control algorithm for multiview high efficiency video coding (MV-HEVC). Firstly, in order to further improve the efficiency of MV-HEVC, this paper uses our previous research algorithm to optimize the MV-HEVC prediction structure. Then, we established the ρ domain rate control model based on multi-objective optimization. Finally, it used image similarity to analyze the correlation between viewpoints, using encoded information and frame complexity to proceed in bit allocation and bit rate control of the inter-view, frame lay, and base unit. The experimental simulation results show that the algorithm can simultaneously maintain high coding efficiency, where the average error of the actual bit rate and the target bit rate is only 0.9%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.