2013
DOI: 10.1093/bib/bbt029
|View full text |Cite
|
Sign up to set email alerts
|

A bi-Poisson model for clustering gene expression profiles by RNA-seq

Abstract: With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation–maximization (E… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 31 publications
(48 reference statements)
0
15
0
Order By: Relevance
“…This new model displays a tremendous methodological breakthrough embodied in the two following aspects: First, existing gene clustering approaches classify tens of thousands of genes recorded on a single biological entity, such as a cell type, an organ, a treatment, or an individual. Although considerable efforts have been made to cluster genes simultaneously on several entities 24 25 26 , no studies thus far have been able to tackle gene clusters on a high-dimensional set of entities which contain unknown latent components. The new model capitalizes on the advantage of a block mixture model for the simultaneous identification of latent gene clusters or latent genotypes from high-dimensional genes × high-dimensional genotype expression data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This new model displays a tremendous methodological breakthrough embodied in the two following aspects: First, existing gene clustering approaches classify tens of thousands of genes recorded on a single biological entity, such as a cell type, an organ, a treatment, or an individual. Although considerable efforts have been made to cluster genes simultaneously on several entities 24 25 26 , no studies thus far have been able to tackle gene clusters on a high-dimensional set of entities which contain unknown latent components. The new model capitalizes on the advantage of a block mixture model for the simultaneous identification of latent gene clusters or latent genotypes from high-dimensional genes × high-dimensional genotype expression data.…”
Section: Discussionmentioning
confidence: 99%
“…Functional clustering, aimed to classify gene expression profiles based on their dynamic changes using parametric or nonparametric approaches 27 28 29 , can be integrated with the block mixture model, which allows dynamic eQTLs for gene clustering to be characterized. Second, to study how the organism responds to changing environment, gene expression experiments frequently include multiple environments or multiple tissues 24 25 26 . The implementation of the block mixture model with multiple environments enables us to understand the impact of genotype-environment interactions on regulatory machineries.…”
Section: Discussionmentioning
confidence: 99%
“…For example, both the k -means and the self-organising map algorithms require the number of clusters as input. Similarly, methods that model the data as a finite mixture of Poisson or negative binomial distributions [ 61 - 63 ] require prior knowledge of the number of mixture components. Estimating the number of clusters usually makes use of an optimality criterion, such as the Bayesian information criterion or the Akaike information criterion, which requires repeated application of the algorithm on the same dataset with different initial choices of the number of clusters.…”
Section: Discussionmentioning
confidence: 99%
“…The expression reads of genes in each treatment are thought to obey a Poisson distribution [19], thus the distribution of the read differences between the two cell types is modeled by the Skellam function [22, 23], expressed as …”
Section: Modelmentioning
confidence: 99%
“…More recently, Wang et al [ 19 ] developed a bi-variate Poisson model to cluster genes expressed in two different environments. Jiang et al [ 20 ] derived an algorithmic model for clustering genes based on their environment-induced differentiation patterns.…”
Section: Introductionmentioning
confidence: 99%