2020
DOI: 10.1101/2020.11.15.383661
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multiscale PHATE Exploration of SARS-CoV-2 Data Reveals Multimodal Signatures of Disease

Abstract: 1SummaryThe biomedical community is producing increasingly high dimensional datasets, integrated from hundreds of patient samples, which current computational techniques struggle to explore. To uncover biological meaning from these complex datasets, we present an approach called Multiscale PHATE, which learns abstracted biological features from data that can be directly predictive of disease. Built on a continuous coarse graining process called diffusion condensation, Multiscale PHATE creates a tree of data gr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 98 publications
0
14
0
Order By: Relevance
“…This is not a desirable attribute for early detection of new differentiated sub-lineages, which are often sampled in lower numbers compared to the other well-established lineages. We thus explored the use of msPHATE on SARS-CoV-2 genetic data, a novel unsupervised learning technique that showed promising results on biological data (13), to enable the identification of additional structure within haplotype groups using whole viral consensus sequences (Step 8, Figure 1). The method creates a tree of data granularities that can be cut at coarse levels for high-level summarizations of data, or at fine levels for detailed representations on subsets.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…This is not a desirable attribute for early detection of new differentiated sub-lineages, which are often sampled in lower numbers compared to the other well-established lineages. We thus explored the use of msPHATE on SARS-CoV-2 genetic data, a novel unsupervised learning technique that showed promising results on biological data (13), to enable the identification of additional structure within haplotype groups using whole viral consensus sequences (Step 8, Figure 1). The method creates a tree of data granularities that can be cut at coarse levels for high-level summarizations of data, or at fine levels for detailed representations on subsets.…”
Section: Resultsmentioning
confidence: 99%
“…The code is available at https://github.com/HussinLab/covid19_mostefai2021_paper. Finally, msPHATE (13) (package available at https://github.com/KrishnaswamyLab/Multiscale_PHATE) embeddings were computed for the first and second waves GISAID high quality consensus sequences. A 0123 encoded matrix was used for each wave to generate the MS-PHATE embeddings.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…2) suggests that the walktrap algorithm provides a flexible and robust way to define super-cells. Practically, we recommend using ∈ [10,50], as this already provides a significant speed-up in the analysis of large datasets.…”
Section: Discussionmentioning
confidence: 99%
“…Then, users proceed to expand clusters and investigate lower (and more detailed) hierarchy levels. Although HDR has been an active area of research in the past few years [10,17,32], current techniques present a few limitations. HiPP [30] uses nodes-encoded as circles with varying areas-to create an overview of the dataset and guide users in the further analysis.…”
Section: Introductionmentioning
confidence: 99%