While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind, and the challenges of, graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
We present CODEX, a set of knowledge graph COmpletion Datasets EXtracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CODEX comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CODEX, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CODEX dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CODEX for five extensively tuned embedding models. Finally, we differentiate CODEX from the popular FB15K-237 knowledge graph completion dataset by showing that CODEX covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at https://bit.ly/2EPbrJs.
The increasing scale of encyclopedic knowledge graphs (KGs) calls for summarization as a way to help users efficiently access and distill world knowledge. Motivated by the disparity between individuals' limited information needs and the massive scale of KGs, in this paper we propose a new problem called personalized knowledge graph summarization. The goal is to construct compact "personal summaries" of KGs containing only the facts most relevant to individuals' interests. Such summaries can be stored and utilized on-device, allowing individuals private, anytime access to the information that interests them most.We formalize the problem as one of constructing a sparse graph, or summary, that maximizes a user's inferred "utility" over a given KG, subject to a user-and device-specific constraint on the summary's size. To solve it, we propose GLIMPSE, a summarization framework that provides theoretical guarantees on the summary's utility and is linear in the number of edges in the KG. In an evaluation with real user queries to opensource, encyclopedic KGs of up to one billion triples, we show that GLIMPSE efficiently creates summaries that outperform strong baselines by up to 19% in query answering F1 score.
Codifying commonsense knowledge in machines is a longstanding goal of artificial intelligence. Recently, much progress toward this goal has been made with automatic knowledge base (KB) construction techniques. However, such techniques focus primarily on the acquisition of positive (true) KB statements, even though negative (false) statements are often also important for discriminative reasoning over commonsense KBs. As a first step toward the latter, this paper proposes NegatER, a framework that ranks potential negatives in commonsense KBs using a contextual language model (LM). Importantly, as most KBs do not contain negatives, NegatER relies only on the positive knowledge in the LM and does not require ground-truth negative examples. Experiments demonstrate that, compared to multiple contrastive data augmentation approaches, NegatER yields negatives that are more grammatical, coherent, and informative-leading to statistically significant accuracy improvements in a challenging KB completion task and confirming that the positive knowledge in LMs can be "repurposed" to generate negative knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.