2016
DOI: 10.1101/051631
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Visualizing the Structure of RNA-seq Expression Data using Grade of Membership Models

Abstract: Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple "populations", and in natural language processing to model documents having words from multiple "topics". Here we illustrate the potential for these models to cluster samples of RNA-seq ge… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(26 citation statements)
references
References 52 publications
0
26
0
Order By: Relevance
“…using a non-parametric regression method such as trend filtering, to allow for the nonlinear trends that must occur in any cyclic phenomenon) before applying downstream analyses to the residuals. On the other hand, if the downstream methods rely on explicit models for count data (e.g., [37]) then controlling for cell cycle may be more complicated and require further methodological development. However, we note that these issues are not unique to our approach: controlling for cell cycle within count-based analyses poses additional methodological challenges for whatever method is used to estimate cell cycle.…”
Section: Discussionmentioning
confidence: 99%
“…using a non-parametric regression method such as trend filtering, to allow for the nonlinear trends that must occur in any cyclic phenomenon) before applying downstream analyses to the residuals. On the other hand, if the downstream methods rely on explicit models for count data (e.g., [37]) then controlling for cell cycle may be more complicated and require further methodological development. However, we note that these issues are not unique to our approach: controlling for cell cycle within count-based analyses poses additional methodological challenges for whatever method is used to estimate cell cycle.…”
Section: Discussionmentioning
confidence: 99%
“…The LDA model was originally introduced in the field of language processing to decompose large sets of text documents into topics (a problem known as “topic modelling”), based solely on their word frequency (Blei et al, ). It has been subsequently extended to the analysis of large and complex data sets in various fields, such as satellite image analysis (Vaduva, Gavat, & Datcu, ), fraud detection (Olszewski, ), bioinformatics (Dey, Hsiao, & Stephens, ; Liu, Tang, Dong, Yao, & Zhou, ) and temporal trends in popular music (Mauch, MacCallum, Levy, & Leroi, ). The popular software “Structure” of population genetics uses an almost identical model to characterize population structure based on the distribution of alleles across individuals (Pritchard, Stephens, & Donnelly, ).…”
Section: Introductionmentioning
confidence: 99%
“…These include standard k-means clustering, hierarchical clustering, and variants that are specifically designed for scRNA-seq data (i.e. RaceID/RaceID2 [6], CIDR [7]) as well as more advanced methods that utilise likelihood-based mixture modelling (countClust) [8], density-based spatial clustering [9] and kernel-based single-cell clustering (SIMLR) [10]. Several studies have compared and summarised various clustering algorithms used for scRNA-seq data analysis [11,12,13].…”
Section: Introductionmentioning
confidence: 99%