2023
DOI: 10.1101/2023.02.15.528673
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Clustering rare diseases within an ontology-enriched knowledge graph

Abstract: Objective Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing and/or platform based therapeutic development. Toward that aim, we utilized an integrative knowledge graph-based approach to constructing clusters of rare diseases. Materials and Methods Data on 3,242 rare diseases were extracted from the National Center for Advancing Translational Science (NCATS) Genetic and Rare Diseases Information center (GARD) internal data resources. The rare diseas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…Deep learning and AI is a rapidly evolving field, and the transformer-based architecture we used for this analysis may not be the optimal way to learn the semantic structure of EHR diagnoses. Recent studies have proposed new ways of derived phenotype embeddings, including extracting them from curated knowledge graphs or from general-purpose large language models pretrained on non-EHR data [68][69][70] . There are also a variety of approaches that have been used to process time series data from EHR in machine learning applications, including neural network models designed for time series data such as recurrent neural networks and using pretrained large language models to process EHR 71,72 .…”
Section: Discussionmentioning
confidence: 99%
“…Deep learning and AI is a rapidly evolving field, and the transformer-based architecture we used for this analysis may not be the optimal way to learn the semantic structure of EHR diagnoses. Recent studies have proposed new ways of derived phenotype embeddings, including extracting them from curated knowledge graphs or from general-purpose large language models pretrained on non-EHR data [68][69][70] . There are also a variety of approaches that have been used to process time series data from EHR in machine learning applications, including neural network models designed for time series data such as recurrent neural networks and using pretrained large language models to process EHR 71,72 .…”
Section: Discussionmentioning
confidence: 99%
“…Detailed description of the disease clustering procedure has been described in a separate submission. [27] We extracted 92 subgraphs from the NGKG, each an ego graph[28] of radius of 3 centered on a node containing one of those 92 GBM-related rare diseases.…”
Section: B Gbm-based Biomedical Pro Le Network (Gbpn)mentioning
confidence: 99%
“…This problem inspired the recent ECoHeN algorithm 36 and is also addressed in proposed alterations to modularity to account for heterogeneous networks 37 ; though this latter approach may still be hindered by the resolution limit inherent to modularity-based approaches 38 . Alternatively, it has been shown how node embeddings can be created and used to represent higher-order patterns in biological knowledge graphs more heterogeneous than those used here 39 .…”
Section: Declaration Of Interestsmentioning
confidence: 99%