Toni Gruetze scite author profile

Toni Gruetze

4Publications

55Citation Statements Received

23Citation Statements Given

How they've been cited

How they cite others

Affiliations

Hasso Plattner Institute, University of Potsdam

Publications

Order By: Most citations

Profiling and mining RDF data with ProLOD++

Abedjan

Gruetze

Jentzsch

et al. 2014

View full text Add to dashboard Cite

Before reaping the benefits of open data to add value to an organizations internal data, such new, external datasets must be analyzed and understood already at the basic level of data types, constraints, value patterns etc. Such data profiling, already difficult for large relational data sources, is even more challenging for RDF datasets, the preferred data model for linked open data.We present ProLOD++, a novel tool for various profiling and mining tasks to understand and ultimately improve open RDF data. ProLOD++ comprises various traditional data profiling tasks, adapted to the RDF data model. In addition, it features many specific profiling results for open data, such as schema discovery for user-generated attributes, association rule discovery to uncover synonymous predicates, and uniqueness discovery along ontology hierarchies. ProLOD++ is highly efficient, allowing interactive profiling for users interested in exploring the properties and structure of yet unknown datasets. I. PROFILING LINKED OPEN DATAAt the time of writing, Linked Open Data (LOD) as compiled in http://linkeddata.org comprised already more than 300 data sources including prominent examples, such as DBpedia, YAGO, and Freebase. A LOD dataset is usually represented in the Resource Description Framework (RDF) embodying an entity-relationship-graph or a set of triplified facts consisting of subjects, predicates, and objects. Most of the datasets are openly available and connected amongst each other via sameAs links between representations of same real-world entities. Hundreds more open RDF datasets are listed for instance at http://datahub.io.However, consuming LOD is not easy, because the sources are heterogeneous, often inconsistent, and lack often even basic metadata. One of the main reasons for this problem is that many of the data sources, such as DBpedia [7] or YAGO [13], have been extracted from unstructured data. Furthermore, a knowledge base usually evolves over time when more facts and entities are added and rigid schema and ontology definitions, hand-crafted at some point of time, lose validity over all entities of the dataset. Hence it is vital to thoroughly examine and understand each dataset, its structure, and its properties before usage.Manually inspecting datasets can achieve this goal only to a limited extent: algorithms and tools are needed that profile the dataset to retrieve relevant and interesting meta-data analyzing the entire dataset [14]. Indeed, there are many commercial tools, such as IBM's Information Analyzer, Microsoft's SQL Server Integration Services (SSIS), or Informatica's Data Explorer, and some research prototypes, such as [12], for profiling relational datasets. However all of these tool were designed to profile relational data. LOD which is represented in RDF data has a very different nature and calls for specific profiling and mining techniques. Current tools to work on RDF data are limited to graph visualization and editing: LODlive 1 is a browser-based tool to browse and search in RDF datasets. RDF Pro...

show abstract

CohEEL: Coherent and efficient named entity linking through random walks

Gruetze

Kasneci

Zuo

et al. 2016

Journal of Web Semantics

View full text Add to dashboard Cite

Topic Shifts in StackOverflow: Ask it Like Socrates

Gruetze

Krestel

Naumann

2016

View full text Add to dashboard Cite

CohEEL: Coherent and Efficient Named Entity Linking Through Random Walks

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Toni Gruetze

Profiling and mining RDF data with ProLOD++

CohEEL: Coherent and efficient named entity linking through random walks

Topic Shifts in StackOverflow: Ask it Like Socrates

CohEEL: Coherent and Efficient Named Entity Linking Through Random Walks

Contact Info

Product

Resources

About