2022
DOI: 10.1093/bioinformatics/btac375
|View full text |Cite|
|
Sign up to set email alerts
|

Integrative data semantics through a model-enabled data stewardship

Abstract: Motivation The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease aetiology. However, achieving this requires a global integration o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 3 publications
0
7
0
Order By: Relevance
“…To date, to our knowledge, there has been no automatic NLP-based harmonization of cohort studies conducted beyond standard string-matching techniques [10] or manual curation [40]. The application of such a standard technique (string-matching) led to relatively poor performance in the present study.…”
Section: Discussionmentioning
confidence: 88%
See 1 more Smart Citation
“…To date, to our knowledge, there has been no automatic NLP-based harmonization of cohort studies conducted beyond standard string-matching techniques [10] or manual curation [40]. The application of such a standard technique (string-matching) led to relatively poor performance in the present study.…”
Section: Discussionmentioning
confidence: 88%
“…Such a data catalog did not address the available variables on a granular level and the variable harmonization aspect across cohort studies. Another study by Wegner et al (2022) established a semi-automatic DST using a string-matching technique for the harmonization of clinical datasets and applied it in the field of dementia [10]. However, despite previous efforts, there is currently no model or tool enabling fully automatic harmonization in the AD field.…”
Section: Introductionmentioning
confidence: 99%
“…2210. 16649) [17]. In addition, the original cohort pseudonyms have been replaced and the random assignment to the new identifier is not stored.…”
Section: Creation Of the Synthetic Cohortmentioning
confidence: 99%
“…To enable a wide scientific community of ataxia researchers to browse and explore this integrated dataset in a first glance in SCAview, we had to shuffle the data due to data protection reasons and of partly unpublished data. Correlations and distributions within the created synthetic cohort are largely kept [17]. To ensure that no subject can be re-identified even with the creation of this synthetic version of a virtual cohort we did not include the study site in SCAview.…”
Section: Integrated Data Setmentioning
confidence: 99%
“…This CDM was built based on the GECCO—German Corona Consensus Dataset ( Sass et al, 2020 )—and extended with 10 other datasets collected globally. The CDM, in combination with the Data Steward Tool (DST) ( https://doi.org/10.1093/bioinformatics/btac375 ) ( Wegner et al, 2021 ), forms an end-to-end data standardization pipeline that can read data, standardize it with a common data standard (CDM) and then export it to well-established health IT formats such as FHIR via RESTful interfaces. Throughout the development, we used the DST to map data from different sources to (i) unify and standardize datasets to standard terminologies and ontologies, (ii) to further enrich the CDM with variable mappings and (iii) to map and compare with other global data standards like OMOP ( https://www.ohdsi.org/data-standardization/the-common-data-model/ ).…”
Section: Introductionmentioning
confidence: 99%