2017
DOI: 10.7717/peerj.3091
|View full text |Cite
|
Sign up to set email alerts
|

Effect of method of deduplication on estimation of differential gene expression using RNA-seq

Abstract: BackgroundRNA-seq is a useful tool for analysis of gene expression. However, its robustness is greatly affected by a number of artifacts. One of them is the presence of duplicated reads.ResultsTo infer the influence of different methods of removal of duplicated reads on estimation of gene expression in cancer genomics, we analyzed paired samples of hepatocellular carcinoma (HCC) and non-tumor liver tissue. Four protocols of data analysis were applied to each sample: processing without deduplication, deduplicat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 33 publications
0
19
0
Order By: Relevance
“…Fig.S6) showing that other sources of noise were major contributors to AI overdispersion. Deduplication can lead to loss of large amounts of legitimate data, and may have other undesirable impacts, such as distorting signal distribution in the biological sample [35][36][37] . Thus, from a practical standpoint, read deduplication has limited utility, and its impact on AI overdispersion is accounted for in the QCC analysis.…”
Section: Sources Of Ai Overdispersionmentioning
confidence: 99%
See 1 more Smart Citation
“…Fig.S6) showing that other sources of noise were major contributors to AI overdispersion. Deduplication can lead to loss of large amounts of legitimate data, and may have other undesirable impacts, such as distorting signal distribution in the biological sample [35][36][37] . Thus, from a practical standpoint, read deduplication has limited utility, and its impact on AI overdispersion is accounted for in the QCC analysis.…”
Section: Sources Of Ai Overdispersionmentioning
confidence: 99%
“…Detailed analysis of specific protocols is outside the scope of this work. However, one common concern in deep sequencing experiments is the impact of PCR amplification artifacts [35][36][37] .…”
Section: Sources Of Ai Overdispersionmentioning
confidence: 99%
“…Duplicate reads (light green) can include technical artifacts as well as reads that represent true transcript abundance. Removal of duplicate reads reduces false positives but also causes false negatives (23). We include duplicate reads in gene expression quantification.…”
Section: Composition Of Rna-seq Datasetsmentioning
confidence: 99%
“…These methods include the use of exogenous molecular barcodes (unique molecular identifiers, UMIs), endogenous position-based method, sequencing technical replicates [20,[22][23][24][25][26] and background error modeling [20,26,27]. The UMI strategy is an effective way to remove stochastic sequencing errors [20,[28][29][30] and duplicates, which can improve the accuracy of lowfrequency variant detection and solve severe quantitative bias in RNA-seq [31]. However, UMIs' universal application is limited by their experimental design [32].…”
Section: Introductionmentioning
confidence: 99%
“…The endogenous position-based method is an alternative way to deal with duplicates and remove errors. Modules in popular tools such as SAMtools [33] and Picard (http:// broadinstitute.github.io/picard/) use this approach to mark duplicates, select a representative read and further improve calling results and RNA quantification [31]. However, these tools are based on 5′ prime position of a read and do not use full segment information.…”
Section: Introductionmentioning
confidence: 99%