2017
DOI: 10.1371/journal.pone.0185020
|View full text |Cite
|
Sign up to set email alerts
|

Challenges and advances for transcriptome assembly in non-model species

Abstract: Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(27 citation statements)
references
References 47 publications
0
26
0
Order By: Relevance
“…In recent years, RNA sequencing (RNA-seq) technology has been widely used to reveal the difference in gene transcriptional levels for many plants with the development of next generation sequencing [ 30 ]. Especially for no-model plants, these organisms were the absence of detailed genetic information or closest reference genome [ 31 ]. In order to avoid the problem of homogenetic amplification caused by non-whole genome sequencing, RNA sequencing (RNA-seq) is a quick way to obtain coding sequences, discover new genes by constructing the genomic library (cDNA) or the transcription database [ 32 ].…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, RNA sequencing (RNA-seq) technology has been widely used to reveal the difference in gene transcriptional levels for many plants with the development of next generation sequencing [ 30 ]. Especially for no-model plants, these organisms were the absence of detailed genetic information or closest reference genome [ 31 ]. In order to avoid the problem of homogenetic amplification caused by non-whole genome sequencing, RNA sequencing (RNA-seq) is a quick way to obtain coding sequences, discover new genes by constructing the genomic library (cDNA) or the transcription database [ 32 ].…”
Section: Introductionmentioning
confidence: 99%
“…Of note, consistent with all short-read assemblers ( Ungaro et al, 2017 ), the ORP assemblies may not accurately reflect the true isoform complexity. Specifically, because of the way that single representative transcripts are chosen from a cluster of related sequences, some transcriptional complexity may be lost.…”
Section: Resultsmentioning
confidence: 98%
“…In addition to reporting synthetic metrics related to assembly structure, reports individual metrics related to specific elements of assembly quality. One such metric estimates the rate of chimerism, a phenomenon which is known to be problematic in de novo assembly ( Ungaro et al, 2017 ; Singhal, 2013 ). Rates of chimerism are relatively constant between all assemblers, ranging from 10% for the assembly, to 12% for the assembly.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is important to note that while overlapping genes may serve as the foundation to indicate relevant functions across species, trophic levels, and/or habitats, increased coverage provided extended details that may be useful in a targeted study, especially if the target environmental contaminants-affected transcript is not represented in low coverage data. Furthermore, additional optimization techniques of de novo assembly of non-model organisms may enhance transcriptomic information, supporting downstream analysis [ 63 , 64 ]. Future work will focus on evaluating the effect of different depths of sequencing combined with de novo assembly parameters (e.g., k -mer length) to determine their effect on discovery of contaminant-affected transcripts.…”
Section: Discussionmentioning
confidence: 99%