How complete are “complete” genome assemblies?—An avian perspective

Peona, Valentina; Weissensteiner, Matthias H.; Suh, Alexander

doi:10.1111/1755-0998.12933

Cited by 115 publications

(143 citation statements)

References 49 publications

Supporting

Mentioning

134

Contrasting

Order By: Relevance

“…This assembly consists of 62,122 scaffolds with a N50 of 52,818 bp (Table ) and was used to resolve the barn owl's position in the bird tree of life (Jarvis et al, ; Prum et al, ) and to search for genes associated with low‐light vision (Hanna et al, ; Hoglund et al, ; Le Duc et al, ; Wu et al, ). However, the use of draft genome entails problems such as noncontiguous assembly and missing genes, especially in GC‐rich portions of bird genomes (Peona, Weissensteiner, & Suh, ). As shown by Warren et al (), adding long reads such as those obtained from single‐molecule real‐time (SMRT, Pacific Biosciences, thereafter called PacBio) improves genome completeness and does not suffer from PCR amplification bias for the sequencing at GC or AT genome‐rich region.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, due to technical difficulty many genes remain non-or partially sequenced in birds(Botero-Castro et al, 2017). A recent study estimated the proportion of missing genome in typical bird assemblies at ~20%(Peona et al, 2018). However, in the European barn owl genome we retrieved genes missing in chicken or otherF I G U R E 6 Avian phylogenetic trees based on the American and European barn owl proteins predicted with the American and European barn owl annotations.…”

mentioning

confidence: 99%

See 1 more Smart Citation

New genome assembly of the barn owl (Tyto alba alba)

Ducrest

Neuenschwander

Schmid‐Siegert

et al. 2020

Ecology and Evolution

View full text Add to dashboard Cite

New genomic tools open doors to study ecology, evolution, and population genomics of wild animals. For the Barn owl species complex, a cosmopolitan nocturnal raptor, a very fragmented draft genome was assembled for the American species (Tyto furcata pratincola) (Jarvis et al. 2014). To improve the genome, we assembled de novo Illumina and Pacific Biosciences (PacBio) long reads sequences of its European counterpart (Tyto alba alba). This genome assembly of 1.219 Gbp comprises 21,509 scaffolds and results in a N50 of 4,615,526 bp. BUSCO (Universal Single‐Copy Orthologs) analysis revealed an assembly completeness of 94.8% with only 1.8% of the genes missing out of 4,915 avian orthologs searched, a proportion similar to that found in the genomes of the zebra finch (Taeniopygia guttata) or the collared flycatcher (Ficedula albicollis). By mapping the reads of the female American barn owl to the male European barn owl reads, we detected several structural variants and identified 70 Mbp of the Z chromosome. The barn owl scaffolds were further mapped to the chromosomes of the zebra finch. In addition, the completeness of the European barn owl genome is demonstrated with 94 of 128 proteins missing in the chicken genome retrieved in the European barn owl transcripts. This improved genome will help future barn owl population genomic investigations.

show abstract

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

New genome assembly of the barn owl (Tyto alba alba)

Ducrest

Neuenschwander

Schmid‐Siegert

et al. 2020

Ecology and Evolution

View full text Add to dashboard Cite

show abstract

“…Furthermore, the linked-read technology is still very new with ongoing developments and improvements of analytical tools and algorithms constantly being made. For instance, after the initial release of the Supernova assembly algorithm by 10X Genomics (v.1.1, Weisenfeld et al 2017 -used Together in combination with other sequencing technologies such as Hi-C or long-reads (Peona et al 2018), there is certainly room for further improvements towards a reference genome. Nevertheless, the current draft assembly ZSil_MB_1.0 marks an essential progress towards unraveling the genomic basis of diversification in a 'great speciator' system.…”

Section: Resultsmentioning

confidence: 99%

Genome Report:De novogenome assembly and annotation for the Taita white-eye (Zosterops silvanus)

Engler

Lawrie

Gansemans

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

The Taita White-eye (Zosterops silvanus) is an endangered songbird endemic to the Taita Hills of Southern Kenya, where it is confined to small areas of fragmented forest.With diversification rates exceeding those reported in most other vertebrates, White-eyes are a prime example of a 'great speciator'. Nevertheless, we still know surprisingly little about the genomic underpinnings leading to this extraordinary fast radiation. Here, we present a draft genome assembly (ZSil_MB_1.0) for the Taita White-eye generated from a blood sample of a wild, female bird captured in the Taita Hills, Kenya. By performing a de novo assembly with linked-reads and annotation of the assembly with the MAKER pipeline, we generated a 1.069 Gb assembly with a scaffold N50 of 1.105 Mb and an L50 of 244. After quality evaluation of the assembly, we identified 92.1% of BUSCOs complete or fragmented, indicating that our de novo assembly is of high quality. This new assembly provides a genomic resource for future studies into the evolutionary and comparative genomics of this rapidly diversifying group of birds.

show abstract

“…Third, analyses based upon genome assemblies will depend strongly on the quality of 875 the assembly. Even for genomes of reasonably high quality, protein coding genes may be 876 missing, either through problems with the annotation process or due to the fact that genes fall 877 into assembly gaps (Peona, Weissensteiner, & Suh, 2018). In other words, genome assembly in 878 not necessarily a panacea for all problems related to expression analyses.…”

Section: Conclusion 859mentioning

confidence: 99%

Error, noise and bias in de novo transcriptome assemblies

Freedman

Clamp

Sackton

2019

Preprint

View full text Add to dashboard Cite

14De novo transcriptome assembly is a powerful tool, widely used over the last decade for making 15 evolutionary inferences. However, it relies on two implicit, untested assumptions: that the 16 assembled transcriptome represents an unbiased, if incomplete, representation of the underlying 17 expressed transcriptome, and that expression estimates from the assembly are good, if noisy 18 approximations of the relative abundance of expressed transcripts. Using publicly available data 19 for model organisms, we demonstrate that, across assembly algorithms, species, and data sets, 20 these assumptions are consistently violated. Using standard filtering approaches, coverage of 21 annotated genes by transcriptome assemblies falls far below complete coverage, even at the 22 less appropriate for studies that seek to understand patterns of genetic variation or gene 81 expression across populations or closely related species. Therefore, we focus on methodological 82 considerations for this class of investigations. 83 84 2. MATERIALS AND METHODS 85 Reference genomes 86For the purpose of benchmarking de novo transcriptome assemblies, and comparing assembly 87 and reference-based expression estimates, we downloaded from ENSEMBL genome and gtf 88 annotation files for the following organisms: house mouse, Mus musculus C57BL/6J 89 (GRCm38); clawed frog, Xenopus tropicalis (JGI_4.2); pufferfish, Tetraodon nigroviridis 90 (TETRAODON8); and fly, Drosophila melanogaster (BDGP6). 91 92 RNA-seq data 93Brain tissue is both transcriptionally complex and expected to express a large proportion of an 94 organism's overall transcriptional profile. For this reason, the bulk of our analyses focus on de 95 novo assemblies for data generated for experiments involving mouse (Mus spp.) brain. These 96 data sets include the Mus musculus (C57BL/6J) dendritic cell data used in the original Trinity 97 paper (MDC), a pool of six whole brains from albino inbred Mus (BALB/c), and 8-sample pools 98 of whole brain samples from wild M. musculus domesticus from Massif Central, France (FRA), 99 Iran (IRN), Kazakhstan (KZK), and Germany (DEU). To assess the generality or particular 100 results with respect to assembly composition, we also generated assemblies for pufferfish 101 (Tetraodon nigroviridis) whole brain, clawed frog (Xenopus tropicalis) kidney, and fly 102 (Drosophila melanogaster) heads. Data set SRA accessions, sequencing strategy and depth are 103 6 summarized in Supplementary Table 1. All libraries except for MDC were sequenced on an 104 Illumina HiSeq 2000; MDC was sequenced on an Illumina GAII. 105 106 2.3 Short read processing 107After an initial assessment of sequences reads with FASTQC 108

show abstract

How complete are “complete” genome assemblies?—An avian perspective

Cited by 115 publications

References 49 publications

New genome assembly of the barn owl (Tyto alba alba)

New genome assembly of the barn owl (Tyto alba alba)

Genome Report:De novogenome assembly and annotation for the Taita white-eye (Zosterops silvanus)

Error, noise and bias in de novo transcriptome assemblies

Contact Info

Product

Resources

About