2019
DOI: 10.1038/s41598-019-44499-3
|View full text |Cite|
|
Sign up to set email alerts
|

Effect of de novo transcriptome assembly on transcript quantification

Abstract: Correct quantification of transcript expression is essential to understand the functional elements in different physiological conditions. For the organisms without the reference transcriptome, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation. In this regard, this study investigates how assembly quality affects the performance of quantification based on … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
30
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(31 citation statements)
references
References 42 publications
1
30
0
Order By: Relevance
“…The de novo assemblies proved better than the assembled genomes with fewer fragmented contigs of candidate genes of organ size and flower color. Similar to other recent studies, there is a wide distribution of quality of assemblies based on the programs and methods used [137,138]; in our case, we saw that SPAdes outperformed Trinity in regards to completeness of BUSCO genes and number of contigs. Adding just over a million long-reads to the N. sylvestris transcriptome decreased the number of contigs by nearly 7000 and increased the N50 values of the assembly by approximately 500 bp while increasing the complete single copy BUSCO genes from 78.5% to 86.4% (Figure 3).…”
Section: De Novo Transcriptome Assemblysupporting
confidence: 88%
“…The de novo assemblies proved better than the assembled genomes with fewer fragmented contigs of candidate genes of organ size and flower color. Similar to other recent studies, there is a wide distribution of quality of assemblies based on the programs and methods used [137,138]; in our case, we saw that SPAdes outperformed Trinity in regards to completeness of BUSCO genes and number of contigs. Adding just over a million long-reads to the N. sylvestris transcriptome decreased the number of contigs by nearly 7000 and increased the N50 values of the assembly by approximately 500 bp while increasing the complete single copy BUSCO genes from 78.5% to 86.4% (Figure 3).…”
Section: De Novo Transcriptome Assemblysupporting
confidence: 88%
“…2). www.nature.com/scientificreports/ www.nature.com/scientificreports/ But, even if de novo assembled transcriptomes are acceptable, bioinformatic tools for RNA-seq are not specifically designed for them due to the low annotation rates and biases in FDR corrections than are exacerbated by redundant transcripts 27 . This is why improved R scripts of DEGenes Hunter were developed in this study to analyse differential expression patterns using SOLSEv5.0 34 as reference.…”
Section: Discussionmentioning
confidence: 99%
“…Previous assembled transcriptomes in sole reported a high number of transcripts 7,8,24,25 that clearly exceeded the expected number of predicted genes as reported in closely related flatfish (about 21,000 protein-coding genes 1,3,26 ). Accurate transcript quantification for gene expression studies is hindered by over-represented transcriptomes 27 , resulting in biased transcript discovery, over-estimation of family-collapsed contigs, and under-estimation of redundant contigs 11,27 . Bioinformatic strategies to reduce artefacts and redundancy have been implemented in sole 7 , but the result was still far from being optimal, and further polishing to increase tissue representativity, transcript completeness and annotations is required.…”
mentioning
confidence: 99%
“…However, for contigs containing the venom peptides open reading frames (ORF), the assembler often generates overextended contigs (see Table S1). Thus, the expression rate of the short venom peptides transcripts would be underestimated with TPM [51]. So we calculated the RPM value for each transcript of interest in two steps: (i) by dividing the number of aligned reads for each contig by the total number of million reads aligned for the sample, and (ii) by summing up the obtained values for each contig encoding the transcript when several contigs represent the same peptide.…”
Section: Contigs Quantificationmentioning
confidence: 99%