2022
DOI: 10.1101/2022.06.24.497555
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery

Abstract: We present a unifying statistical formulation for many fundamental problems in genome science and develop a reference-free, highly efficient algorithm that solves it. Sequence diversification - nucleic acid mutation, rearrangement, and reassortment - is necessary for the differentiation and adaptation of all replicating organisms. Identifying sample-dependent sequence diversification, e.g. adaptation or regulated isoform expression, is fundamental to many biological studies, and is achieved today with next-gen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

4
1

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 187 publications
0
8
0
Order By: Relevance
“…In total, we ran NOMAD on 13,500 SmartSeq2 cells from 136 cell types. NOMAD was run with default parameters except for the following parameters for number of random partitions for input cells and number of random hashes for each partition (Chaung et al 2022): L_num_random_Cj = 300 and K_num_hashes = 10.…”
Section: Nomad Runsmentioning
confidence: 99%
See 1 more Smart Citation
“…In total, we ran NOMAD on 13,500 SmartSeq2 cells from 136 cell types. NOMAD was run with default parameters except for the following parameters for number of random partitions for input cells and number of random hashes for each partition (Chaung et al 2022): L_num_random_Cj = 300 and K_num_hashes = 10.…”
Section: Nomad Runsmentioning
confidence: 99%
“…We recently introduced NOMAD (Chaung et al 2022), which shows that myriad biological processes that diversify transcripts can be detected with a unified reference-free algorithm, performing inference directly on raw, unaligned sequencing reads. This includes but is not limited to RNA splicing, mutations, RNA editing, and V(D)J recombination.…”
Section: Introductionmentioning
confidence: 99%
“…Further, single cell sequencing technology and analysis may be under-ascertaining RNA expression due to (i) sampling depth; (ii) poly-A capture bias and (iii) computational algorithms to analyze isoform-specific differences. Through the ReadZS we have collapsed UTR variation to a single scalar value 8,52,56 but we have not explored correlations with RNA splicing or other sequence variants, a topic of further research. Our findings support a model where 3’ UTR regulation at the nucleotide level controls localization through function.…”
Section: Discussionmentioning
confidence: 99%
“…To address these conceptual and technical challenges for studying sequence variation in RNA or DNA that is sample-dependent, we recently introduced NOMAD (Chaung et al 2022). NOMAD leverages the observation that detecting sample-regulated sequence variation, such as alternative splicing, RNA editing, gene fusions, V(D)J, transposable element mobilization, allele-specific splicing, and genetic variation in a population, among many other regulated events can be unified-in theory and in practice.…”
mentioning
confidence: 99%
“…Each list is used to construct a contingency table, a base data structure used to compute a statistically valid p-value, along with several measures (e.g. effect size) for downstream interpretability (Methods) (Chaung et al 2022). This p-value is constructed using unsupervised optimization detailed in (Baharav, Tse, and Salzman, 2023).This step is also memory-frugal (Figure 1C, green area), because only data of a single anchor needs to reside in main memory, and is represented as a sparse matrix.…”
mentioning
confidence: 99%