The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). The model builds a directed graph over the object labels, where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers. These classifiers are applied to the image descriptors extracted by another sub-net, enabling the whole network to be end-to-end trainable. Furthermore, we propose a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN. Experiments on two multi-label image recognition datasets show that our approach obviously outperforms other existing state-of-the-art methods. In addition, visualization analyses reveal that the classifiers learned by our model maintain meaningful semantic topology.
Previous video summarization studies focused on monocular videos, and the results would not be good if they were applied to multi-view videos directly, due to problems such as the redundancy in multiple views. In this paper, we present a method for summarizing multi-view videos. We construct a spatio-temporal shot graph and formulate the summarization problem as a graph labeling task. The spatio-temporal shot graph is derived from a hypergraph, which encodes the correlations with different attributes among multi-view video shots in hyperedges. We then partition the shot graph and identify clusters of event-centered shots with similar contents via random walks. The summarization result is generated through solving a multi-objective optimization problem based on shot importance evaluated using a Gaussian entropy fusion scheme. Different summarization objectives, such as minimum summary length and maximum information coverage, can be accomplished in the framework. Moreover, multi-level summarization can be achieved easily by configuring the optimization parameters. We also propose the multi-view storyboard and event board for presenting multi-view summaries. The storyboard naturally reflects correlations among multi-view summarized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order. Single video summary which facilitates quick browsing of the summarized multi-view video can be easily generated based on the event board representation.Index Terms-Multi-objective optimization, multi-view video, random walks, spatio-temporal graph, video summarization.
For the non-local denoising approach presented by Buades et al., remarkable denoising results are obtained at high expense of computational cost. In this paper, a new algorithm that reduces the computational cost for calculating the similarity of neighborhood windows is proposed. We first introduce an approximate measure about the similarity of neighborhood windows, then we use an efficient Summed Square Image (SSI) scheme and Fast Fourier Transform (FFT) to accelerate the calculation of this measure. Our algorithm is about fifty times faster than the original non-local algorithm both theoretically and experimentally, yet produces comparable results in terms of mean-squared error (MSE) and perceptual image quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.