2021
DOI: 10.1371/journal.pcbi.1009631
|View full text |Cite
|
Sign up to set email alerts
|

CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure

Abstract: With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 68 publications
0
3
0
Order By: Relevance
“…Given that the amount of DNA involved in these studies is minimal, MDA is applied to amplify the DNA for subsequent sequencing, but the chimerism generated during the process can greatly impact the overall accuracy. Researches indicate that the chimerism in MDA sequencing data cannot be ignored and is attracting increasing attention, especially as single-cell studies have become a hot topic [13] , [51] , [52] , [53] , [54] .…”
Section: Introductionmentioning
confidence: 99%
“…Given that the amount of DNA involved in these studies is minimal, MDA is applied to amplify the DNA for subsequent sequencing, but the chimerism generated during the process can greatly impact the overall accuracy. Researches indicate that the chimerism in MDA sequencing data cannot be ignored and is attracting increasing attention, especially as single-cell studies have become a hot topic [13] , [51] , [52] , [53] , [54] .…”
Section: Introductionmentioning
confidence: 99%
“…Background count variation refers to varying count levels associated with individual transcripts that are not maintained across replicates of a condition, thus effectively reflecting intra-condition noise, whilst specifying a subset of transcripts to be over represented across replicates of a condition reflects identifiable over expressed transcripts. A similar experiment to those described here, but in relation to the effects of chimerism on the results of differential expression, has been described in Linheiro and Archer (2021) [ 50 ].…”
Section: Methodsmentioning
confidence: 60%
“…High intra-condition count variation at an inter, and to a lesser extent intra, study level can arise from a range of sources including i) biological differences between samples such as age, sex, diet, and health; ii) in silica error involving assembly tools producing poorly understood chimeras within the reference transcriptome [ 50 , 60 , 61 ]; iii) ambiguities in read mapping to such references [ 62 ]; iv) normalization of count data derived from such mapped reads [ 63 ]; and v) including in vitro error during library preparation protocols [ 64 , 65 ]. Although we used DESeq2 within our study, the results of our exploration on the effects of intra-condition variation in the detection of differentially expressed transcripts likely applies to other software used for differential expression analysis that rely on per transcript count information across replicates for the estimation of transcript abundance and dispersion, for example, edgeR [ 10 ], BBSeq [ 66 ], DSS [ 67 ], baySeq [ 68 ] and ShrinkBayes [ 69 ].…”
Section: Discussionmentioning
confidence: 99%