2016
DOI: 10.1534/g3.116.032805
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana)

Abstract: Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
45
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 49 publications
(48 citation statements)
references
References 78 publications
2
45
0
Order By: Relevance
“…Additional scaffolding steps used a set of transcript sequences assembled from Pacific Biosciences (PacBio) and Illumina RNA-seq data (Table S7) from Gonzalez-Ibeas et al (2016). We aligned transcript sequences to the whole genome shotgun (WGS) scaffolds using both nucmer (-maxmatch -nosimplify -l 45 -c 4) (Kurtz et al 2004) and bwa-mem (-k 45 -O 60 -E 10) (Li 2013).…”
Section: Transcriptome Scaffoldingmentioning
confidence: 99%
“…Additional scaffolding steps used a set of transcript sequences assembled from Pacific Biosciences (PacBio) and Illumina RNA-seq data (Table S7) from Gonzalez-Ibeas et al (2016). We aligned transcript sequences to the whole genome shotgun (WGS) scaffolds using both nucmer (-maxmatch -nosimplify -l 45 -c 4) (Kurtz et al 2004) and bwa-mem (-k 45 -O 60 -E 10) (Li 2013).…”
Section: Transcriptome Scaffoldingmentioning
confidence: 99%
“…The family Pinaceae, with their haploid genome sizes ranging between~10 and~36 Gb (De La Torre et al, 2014), is one of such groups. Owing to a combination of sequencing techniques and, more importantly, to a series of novel bioinformatic strategies and tools, the enormous genomes of several Pinaceae have been recently sequenced and assembled (Birol et al, 2013;Nystedt et al, 2013;Neale et al, 2014Neale et al, , 2017Wegrzyn et al, 2014;Zimin et al, 2014;Warren et al, 2015;Gonzalez-Ibeas et al, 2016;Stevens et al, 2016). Given the size and repetitiveness of these genomes, the resulting assemblies tend to be highly fragmented, with a negative effect on the accuracy of gene annotation.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, the high number of transposable elements (TEs) found in conifer genomes may have been erroneously annotated as 'host' genes. However, predicted Pinaceae genes with typical TE domains have been removed from the final gene sets in the analyzed species (Nystedt et al, 2013;Wegrzyn et al, 2014;Gonzalez-Ibeas et al, 2016;Neale et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…Considering the large and complex genomes of conifers, reference transcriptomes are increasingly being used as a reference resource in a variety of applications (Müller et al, 2015;Suren et al, 2016;Wachowiak et al, 2015). Particularly in Pinaceae, where the vast majority of transcriptomes have been generated in gymnosperms (López de Heredia & Vázquez-Poletti, 2016), the number of assembled contigs (transcripts) is always larger (Table S6) than the actual number of genes estimated based on their genome sequence (Gonzales-Ibeas et al, 2016;Neale et al, 2014;Nystedt et al, 2013).…”
Section: Discussionmentioning
confidence: 99%
“…This combined set of P. sylvestris contigs was used for secondary assembly using OGA (Ruttink et al, 2013) with previously published proteomes from Pinus taeda and Pinus lambertiana (Figure 1) to guide the assembly. We either used 1) all annotated proteins (ALL) from P. taeda v1.0; or 2) all annotated proteins (ALL) from P. taeda v2.01 (Neale et al, 2014); or 3) all annotated proteins from P. lambertiana v1.0 (Gonzales-Ibeas, Martinez-Garcia, Famula, & Delfino-Mix, 2016;Stevens et al, 2016); or 4) only the high quality curated proteins (HQ) from P. taeda; or 5) from P. lambertiana (Table 1). Briefly, OGA first uses sequence similarity (tBLASTn) to the proteomes of the reference species to select allelic and fragmented contigs from all genotypes (assembled individually) per reference protein, then applies CAP3 clustering on a gene-by-gene basis (Ruttink et al, 2013), and finally selects the most likely orthologous CAP3 contigs per protein of the reference species.…”
Section: Secondary Clustering: the Orthology Guided Assembly (Oga) Apmentioning
confidence: 99%