RDF graph summarization for first-sight structure discovery

Goasdoué, François; Guzewicz, Pawel; Manolescu, Ioana

doi:10.1007/s00778-020-00611-y

Cited by 23 publications

(34 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Structural graph summaries are used for various different tasks such as cardinality estimations for queries in graph databases [46], data exploration [6,43,48,53], data visualization [25], vocabulary term recommendations [51], related entity retrieval [16], and query answering in data search [27]. The distinguishing characteristic of structural graph summaries is that they partition the set of vertices in a graph based on equivalences of subgraphs [10].…”

Section: Introductionmentioning

confidence: 99%

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Blume

Richerby

Scherp

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Graph summarization is the task of finding condensed representations of graphs such that a chosen set of (structural) subgraph features in the graph summary are equivalent to the input graph. Existing graph summarization algorithms are tailored to specific graph summary models, only support one-time batch computation, are designed and implemented for a specific task, or evaluated using static graphs. Our novel, incremental, parallel algorithm addresses all these shortcomings. We support various structural graph summary models defined in our formal language FLUID. All graph summaries defined with FLUID can be updated in time O(∆ •d k), where ∆ is the number of additions, deletions, and modifications to the input graph, d is its maximum degree, and k is the maximum distance in the subgraphs considered. We empirically evaluate the performance of our algorithm on benchmark and real-world datasets. Our experiments show that, for commonly used summary models and datasets, the incremental summarization algorithm almost always outperforms their batch counterpart, even when about 50% of the graph database changes. The source code and the experimental results are openly available for reproducibility and extensibility.

show abstract

Section: Introductionmentioning

confidence: 99%

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Blume

Richerby

Scherp

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

show abstract

“…In particular, according to what has been reported in [43], it is approximately 6 times larger than the more significant dataset processed by DistLODStat (i.e., 200 GB). Other profiling approaches, such as [20], experimented with real and synthetic graphs of up to 36.5 GB (approx 32 times smaller than makg), while [17] is evaluated on 6 datasets where the biggest one has the size of 56 MB (approx 21.125 times smaller than makg).…”

Section: Abstat-hd Versus Related Workmentioning

confidence: 99%

“…16 Moreover, it uses 5 external types from the fabio ontology 17 (Book, BookChapter, ConferencePaper, JournalArticle, and PatentDocument) and 25 external properties (from ontologies fabio, purl, 18 cito, 19 dbpedia, etc.). The Microsoft Academic Knowledge Graph maintainers have published also the schema 20 as an easier way to visualize relations among types and datatypes. From this schema, a user can easily notice that the KG makes use of two owl:equivalentClass: one between makg:FieldOfStudy and fabio:SubjectDiscipline and the other between makg:Paper and fabio:Work.…”

Section: Potential Errors Detected In the Makgmentioning

confidence: 99%

“…From this schema, a user can easily notice that the KG makes use of two owl:equivalentClass: one between makg:FieldOfStudy and fabio:SubjectDiscipline and the other between makg:Paper and fabio:Work. 16 http://ma-graph.org/ontology.owl 17 https://sparontologies.github.io/fabio/current/fabio.html 18 http://purl.org/dc/terms/ 19 http://purl.org/spar/cito 20 http://ma-graph.org/schema-linked-dataset-descriptions/ However, both relations are present only in the schema depicted in their website, but they are both missing in the owl ontology. All the external types used in the dataset from fabio ontology (Book, BookChapter, ConferencePaper, JournalArticle, and PatentDocument) are subtypes of the class fabio:Expressions.…”

Section: Potential Errors Detected In the Makgmentioning

confidence: 99%

See 1 more Smart Citation

ABSTAT-HD: a scalable tool for profiling very large knowledge graphs

et al. 2021

View full text Add to dashboard Cite

Processing large-scale and highly interconnected Knowledge Graphs (KG) is becoming crucial for many applications such as recommender systems, question answering, etc. Profiling approaches have been proposed to summarize large KGs with the aim to produce concise and meaningful representation so that they can be easily managed. However, constructing profiles and calculating several statistics such as cardinality descriptors or inferences are resource expensive. In this paper, we present ABSTAT-HD, a highly distributed profiling tool that supports users in profiling and understanding big and complex knowledge graphs. We demonstrate the impact of the new architecture of ABSTAT-HD by presenting a set of experiments that show its scalability with respect to three dimensions of the data to be processed: size, complexity and workload. The experimentation shows that our profiling framework provides informative and concise profiles, and can process and manage very large KGs.

show abstract

“…RDF summarization can be performed using a wide range of different techniques based in different dimensions of the target graph. Even if most of the current techniques rely at some point on concepts of node importance or relevance, some techniques, such as pattern-minning methods [ 42 ] or quotient summaries [ 43 , 44 ] may not use importance metrics.…”

Section: Related Workmentioning

confidence: 99%

Approaches to measure class importance in Knowledge Graphs

et al. 2021

View full text Add to dashboard Cite

The amount, size, complexity, and importance of Knowledge Graphs (KGs) have increased during the last decade. Many different communities have chosen to publish their datasets using Linked Data principles, which favors the integration of this information with many other sources published using the same principles and technologies. Such a scenario requires to develop techniques of Linked Data Summarization. The concept of a class is one of the core elements used to define the ontologies which sustain most of the existing KGs. Moreover, classes are an excellent tool to refer to an abstract idea which groups many individuals (or instances) in the context of a given KG, which is handy to use when producing summaries of its content. Rankings of class importance are a powerful summarization tool that can be used both to obtain a superficial view of the content of a given KG and to prioritize many different actions over the data (data quality checking, visualization, relevance for search engines…). In this paper, we analyze existing techniques to measure class importance and propose a novel approach called ClassRank. We compare the class usage in SPARQL logs of different KGs with the importance ranking produced by the approaches evaluated. Then, we discuss the strengths and weaknesses of the evaluated techniques. Our experimentation suggests that ClassRank outperforms state-of-the-art approaches measuring class importance.

show abstract

RDF graph summarization for first-sight structure discovery

Cited by 23 publications

References 32 publications

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs

ABSTAT-HD: a scalable tool for profiling very large knowledge graphs

Approaches to measure class importance in Knowledge Graphs

Contact Info

Product

Resources

About