The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-ofthe-art performance on ActivityNet'18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016;Gao et al. 2017) while observing only 10 or less clips per video.
Abstract-Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources, such as erroneous data entry, and find similarity matches. In this paper, we study the graph similarity join problem that returns pairs of graphs such that their edit distances are no larger than a threshold. Inspired by the -gram idea for string similarity problem, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. An efficient algorithm is proposed to exploit both matching and mismatching features to improve the filtering and verification on candidates. We demonstrate the proposed algorithm significantly outperforms existing approaches with extensive experiments on publicly available datasets.
Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.
Entity alignment (EA) identifies entities that refer to the same real-world object but locate in different knowledge graphs (KGs), and has been harnessed for KG construction and integration. When generating EA results, current embeddingbased solutions treat entities independently and fail to take into account the interdependence between entities. In addition, most of embedding-based EA methods either fuse different features on representation-level and generate unified entity embedding for alignment, which potentially causes information loss, or aggregate features on outcome-level with hand-tuned weights, which is not practical with increasing numbers of features.To tackle these deficiencies, we propose a collective embeddingbased EA framework with adaptive feature fusion mechanism. We first employ three representative features, i.e., structural, semantic and string signals, for capturing different aspects of the similarity between entities in heterogeneous KGs. These features are then integrated at outcome-level, with dynamically assigned weights generated by our carefully devised adaptive feature fusion strategy. Eventually, in order to make collective EA decisions, we formulate EA as the classical stable matching problem between entities to be aligned, with preference lists constructed using fused feature matrix. It is further effectively solved by deferred acceptance algorithm. Our proposal is evaluated on both cross-lingual and mono-lingual EA benchmarks against state-ofthe-art solutions, and the empirical results verify its effectiveness and superiority. We also perform ablation study to gain insights into framework modules.
A systematic investigation of the nanoparticle-enhanced light trapping in thin-film silicon solar cells is reported. The nanoparticles are fabricated by annealing a thin Ag film on the cell surface. An optimisation roadmap for the plasmonenhanced light-trapping scheme for self-assembled Ag metal nanoparticles is presented, including a comparison of rearlocated and front-located nanoparticles, an optimisation of the precursor Ag film thickness, an investigation on different conditions of the nanoparticle dielectric environment and a combination of nanoparticles with other supplementary backsurface reflectors. Significant photocurrent enhancements have been achieved because of high scattering and coupling efficiency of the Ag nanoparticles into the silicon device. For the optimum light-trapping scheme, a short-circuit current enhancement of 27% due to Ag nanoparticles is achieved, increasing to 44% for a "nanoparticle/magnesium fluoride/ diffuse paint" back-surface reflector structure. This is 6% higher compared with our previously reported plasmonic shortcircuit current enhancement of 38%.
Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, pattern recognition, etc. A fundamental and critical query primitive is to efficiently search similar structures in a large collection of graphs. This paper studies the graph similarity queries with edit distance constraints. Existing solutions to the problem utilize fixed-size overlapping substructures to generate candidates, and thus become susceptible to large vertex degrees or large distance thresholds. In this paper, we present a partition-based approach to tackle the problem. By dividing data graphs into variable-size nonoverlapping partitions, the edit distance constraint is converted to a graph containment constraint for candidate generation. We develop efficient query processing algorithms based on the new paradigm. A candidate pruning technique and an improved graph edit distance algorithm are also developed to further boost the performance. In addition, a cost-aware graph partitioning technique is devised to optimize the index. Extensive experiments demonstrate our approach significantly outperforms existing approaches.
Entity alignment (EA) finds equivalent entities that are located in different knowledge graphs (KGs), which is an essential step to enhance the quality of KGs, and hence of significance to downstream applications (e.g., question answering and recommendation). Recent years have witnessed a rapid increase of EA approaches, yet the relative performance of them remains unclear, partly due to the incomplete empirical evaluations, as well as the fact that comparisons were carried out under different settings (i.e., datasets, information used as input, etc.). In this paper, we fill in the gap by conducting a comprehensive evaluation and detailed analysis of state-of-the-art EA approaches. We first propose a general EA framework that encompasses all the current methods, and then group existing methods into three major categories. Next, we judiciously evaluate these solutions on a wide range of use cases, based on their effectiveness, efficiency and robustness. Finally, we construct a new EA dataset to mirror the real-life challenges of alignment, which were largely overlooked by existing literature. This study strives to provide a clear picture of the strengths and weaknesses of current EA approaches, so as to inspire quality follow-up research. ! 1. As where we are standing, EA can be deemed as a special case of entity resolution (ER), which recalls a pile of literature (to be discussed in Section 2.2). Thus, some ER methods (with minor adaptation to handle EA) are also involved in this study to ensure the comprehensiveness of the research.this article, we provide an empirical evaluation of state-of-the-art EA approaches with the following features: (1) Fair comparison within and across categories. Almost all recent studies [5], [24], [38], [55], [60], [61], [62], [63],[67] are confined to comparing with only a subset of methods. In addition, different approaches follow different settings: some merely use the KG structure for alignment, while others also utilize additional information; some align KGs in one pass, while others employ an iterative (re-)training strategy. Although a direct comparison of these methods, as reported in the literature, demonstrates the overall effectiveness of the solutions, a more preferable and fairer practice would be to group these methods into categories and then compare the results both within and across categories.In this study, we include most state-of-the-art methods for lateral comparison, including those very recent efforts that have not yet been compared with others before. By dividing them into three groups and conducting detailed analysis on both intraand inter-group evaluations, we are able to better position these approaches and assess their effectiveness.(2) Comprehensive evaluation on representative datasets. To evaluate the performance of EA systems, several datasets have been constructed, which can be broadly categorized into cross-lingual benchmarks, represented by DBP15K [53], and mono-lingual benchmarks, represented by DWY100K [54]. A very recent study [24] points out th...
Entity alignment (EA) is to discover equivalent entities in knowledge graphs (KGs), which bridges heterogeneous sources of information and facilitates the integration of knowledge. Existing EA solutions mainly rely on structural information to align entities, typically through KG embedding. Nonetheless, in real-life KGs, only a few entities are densely connected to others, and the rest majority possess rather sparse neighborhood structure. We refer to the la er as long-tail entities, and observe that such phenomenon arguably limits the use of structural information for EA.To mitigate the issue, we revisit and investigate into the conventional EA pipeline in pursuit of elegant performance. For prealignment, we propose to amplify long-tail entities, which are of relatively weak structural information, with entity name information that is generally available (but overlooked) in the form of concatenated power mean word embeddings. For alignment, under a novel complementary framework of consolidating structural and name signals, we identify entity's degree as important guidance to e ectively fuse two di erent sources of information. To this end, a degree-aware co-a ention network is conceived, which dynamically adjusts the signi cance of features in a degree-aware manner. For post-alignment, we propose to complement original KGs with facts from their counterparts by using con dent EA results as anchors via iterative training. Comprehensive experimental evaluations validate the superiority of our proposed techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.