2021
DOI: 10.1093/nargab/lqab005
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic outlier identification for RNA sequencing generalized linear models

Abstract: Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been devel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 34 publications
(37 reference statements)
1
8
0
Order By: Relevance
“…We show that the data property which causes LSCO and RidgeCO to produce megahubs is found in biological data. This is further supported by previous work studying outlier detection and handling in both microarray ( Kadota et al , 2003 ; Shieh and Hung, 2009 ; Yang et al , 2009 ) and sequencing data ( Love et al , 2014 ; Mangiola et al , 2021 ). For GRNI, an example of a cause for such an outlier is the inclusion of a gene that is not part of the studied system.…”
Section: Discussionsupporting
confidence: 80%
“…We show that the data property which causes LSCO and RidgeCO to produce megahubs is found in biological data. This is further supported by previous work studying outlier detection and handling in both microarray ( Kadota et al , 2003 ; Shieh and Hung, 2009 ; Yang et al , 2009 ) and sequencing data ( Love et al , 2014 ; Mangiola et al , 2021 ). For GRNI, an example of a cause for such an outlier is the inclusion of a gene that is not part of the studied system.…”
Section: Discussionsupporting
confidence: 80%
“…The metastatic condition (non-metastatic, low or high-burden) was chosen as the only covariate. We detected outlier sample-gene observations using ppcseq [86]. The significant genes that included outliers were filtered out.…”
Section: Methodsmentioning
confidence: 99%
“…This fit will likely be not biased by outliers and can produce reliable posterior probability distribution to base an accurate outlier identification. The posterior predictive distribution is then produced adjusting for observation censoring (42). This adjustment is necessary because eliminating data at the distribution’s tails leads to downwards biases for the estimated variance.…”
Section: Methodsmentioning
confidence: 99%
“…A robust iterative strategy for outlier identification was developed for negative-binomial data from bulk RNA sequencing (42). Such a strategy is necessary because a fit that includes outliers makes the model biased by definition and produces skewed estimates.…”
Section: Methodsmentioning
confidence: 99%