Sebastian Schaaf scite author profile

Sebastian Schaaf

16Publications

27Citation Statements Received

122Citation Statements Given

How they've been cited

How they cite others

185

120

Affiliations

German Center for Neurodegenerative Diseases, Fraunhofer Institute for Algorithms and Scientific Computing

Publications

Order By: Most citations

Document Clustering using a Graph Covering with Pseudostable Sets

Dörpinghaus

Schaaf

Fluck

et al. 2017

View full text Add to dashboard Cite

Abstract-In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. In this paper we present a new graph theoretical approach to document clustering and its application on a real-world data set. We will show that the wellknown graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to make a soft clustering as well as a hard clustering. We will present an integer linear programming and a greedy approach for this NP-complete problem and discuss some results on random instances and some real world data for different similarity measures.

show abstract

Integrative data semantics through a model-enabled data stewardship

Wegner

Schaaf

Uebachs

et al. 2022

View full text Add to dashboard Cite

Motivation The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease aetiology. However, achieving this requires a global integration of data across studies, which proves to be challenging given the lack of interoperability of cohort datasets. Results Here, we present the Data Steward Tool (DST), an application that allows for semi-automatic semantic integration of clinical data into ontologies and global data models and data standards. We demonstrate the applicability of the tool in the field of dementia research by establishing a Clinical Data Model (CDM) in this domain. The CDM currently consists of 277 common variables covering demographics (e.g. age and gender), diagnostics, neuropsychological tests, and biomarker measurements. The DST combined with this disease-specific data model shows how interoperability between multiple, heterogeneous dementia datasets can be achieved. Availability The DST source code and Docker images are respectively available at https://github.com/SCAI-BIO/data-steward and https://hub.docker.com/r/phwegner/data-steward. Furthermore, the DST is hosted at https://data-steward.bio.sca.fraunhofer.de/data-steward. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Deep Learning-based detection of psychiatric attributes from German mental health records

Madan

Zimmer

Balabin

et al. 2022

International Journal of Medical Informatics

View full text Add to dashboard Cite

PSB 2019 Workshop on Text Mining and Visualization for Precision Medicine

Gonzalez-Hernandez

Leaman

et al. 2018

View full text Add to dashboard Cite

Errors in level recorder data: Prevention and detection

Schaaf¹

1984

Journal of Hydrology

View full text Add to dashboard Cite

Soft document clustering using a novel graph covering approach

2018

View full text Add to dashboard Cite

BackgroundIn text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation.ResultsIn this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to perform a soft clustering as well as a hard clustering. The software is freely available on GitHub.ConclusionsThe presented integer linear programming as well as the greedy approach for this -complete problem lead to valuable results on random instances and some real-world data for different similarity measures. We could show that PS-Document Clustering is a remarkable approach to document clustering and opens the complete toolbox of graph theory to this field.Electronic supplementary materialThe online version of this article (10.1186/s13040-018-0172-x) contains supplementary material, which is available to authorized users.

show abstract

An efficient approach towards the generation and analysis of interoperable clinical data in a knowledge graph

Dörpinghaus¹,

Weil²,

Schaaf³

et al. 2021

View full text Add to dashboard Cite

Knowledge graphs have been shown to play an important role in recent knowledge mining settings, for example in the fields of life sciences or bioinformatics. Contextual information is widely used for NLP and knowledge discovery tasks, since it highly influences the exact meaning of expressions and also queries on data.The contributions of this paper are (1) an efficient approach towards interoperable data, (2) a runtime analysis of 14 realworld use cases represented by graph queries and (3) a unique view on clinical data and its application, combining methods of algorithmic optimisation, graph theory and data science.

show abstract

Knowledge Discovery and AI Approaches for the Life Sciences

Apke

Weil²,

Dörpinghaus

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.