2019
DOI: 10.1101/678771
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accurate assembly of the olive baboon (Papio anubis) genome using long-­read and Hi-C data

Abstract: Besides macaques, baboons are the most commonly used nonhuman primate in biomedical research. Despite this importance, the genomic resources for baboons are quite limited. In particular, the current baboon reference genome Panu_3.0 is a highly fragmented, reference-guided (i.e., not fully de novo) assembly, and its poor quality inhibits our ability to conduct downstream genomic analyses. Here we present a truly de novo genome assembly of the olive baboon (Papio anubis) that uses data from several recently deve… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 51 publications
(47 reference statements)
0
10
0
Order By: Relevance
“…The use of less GC-biased long reads of single DNA molecules and ultra-long reads spanning repeated genomic regions provides a powerful solution for obtaining assemblies with high contiguity and completeness, although long-read sequencing has limited accuracy (10-20% errors). Long reads can indeed be used alone at a high depth of coverage permitting autocorrection (Koren et al, 2017; Shafin et al, 2020) or in combination with short reads for (1) scaffolding short-read contigs (Armstrong et al, 2020; Kwan et al, 2019), (2) using short reads to polish long-read contigs (Batra et al, 2019b; Datema et al, 2016; Jansen et al, 2017; Michael et al, 2018), or (3) optimizing the assembly process by using information from both long and short reads (Díaz-Viraqué et al, 2019; Gan et al, 2019; Jiang et al, 2019; Kadobianskyi et al, 2019; Tan et al, 2018; Wang et al, 2020; Zimin et al, 2017). Given the previously demonstrated efficiency of the MaSuRCA tool for the assembly of large genomes (Scott et al, 2020; Wang et al, 2020; Zimin et al, 2017), we decided to rely on hybrid sequencing data combining the advantages of Illumina short-read and Nanopore long-read sequencing technologies.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The use of less GC-biased long reads of single DNA molecules and ultra-long reads spanning repeated genomic regions provides a powerful solution for obtaining assemblies with high contiguity and completeness, although long-read sequencing has limited accuracy (10-20% errors). Long reads can indeed be used alone at a high depth of coverage permitting autocorrection (Koren et al, 2017; Shafin et al, 2020) or in combination with short reads for (1) scaffolding short-read contigs (Armstrong et al, 2020; Kwan et al, 2019), (2) using short reads to polish long-read contigs (Batra et al, 2019b; Datema et al, 2016; Jansen et al, 2017; Michael et al, 2018), or (3) optimizing the assembly process by using information from both long and short reads (Díaz-Viraqué et al, 2019; Gan et al, 2019; Jiang et al, 2019; Kadobianskyi et al, 2019; Tan et al, 2018; Wang et al, 2020; Zimin et al, 2017). Given the previously demonstrated efficiency of the MaSuRCA tool for the assembly of large genomes (Scott et al, 2020; Wang et al, 2020; Zimin et al, 2017), we decided to rely on hybrid sequencing data combining the advantages of Illumina short-read and Nanopore long-read sequencing technologies.…”
Section: Discussionmentioning
confidence: 99%
“…This approach is particularly suitable for sequencing roadkill specimens for which it is notoriously difficult to obtain a large amount of high-quality DNA because of post-mortem DNA degradation processes. Furthermore, it is possible to correct errors in ONT long reads by combining them with Illumina short reads, either to polish de novo long read-based genome assemblies (Batra et al, 2019a; Jain et al, 2018; Nicholls et al, 2019; Walker et al, 2014) or to construct hybrid assemblies (Di Genova et al, 2018; Gan et al, 2019; Tan et al, 2018; Zimin et al, 2013). In hybrid assembly approaches, the accuracy of short reads with high depth of coverage (50-100x) allows the use of long reads at lower depth of coverage (10-30x) essentially for scaffolding (Armstrong et al, 2020; Kwan et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…We constructed indexed RNA-seq libraries using the NEBNext Ultra I or II library prep kits, followed by paired-end sequencing on an Illumina HiSeq 2500 (for samples collected from 2013 – 2016) or single-end on a HiSeq 4000 (for samples collected after 2016) to a mean depth of 17.4 million reads (± 7.7 million SD; Table S1). Trimmed reads were mapped to the Panubis 1.0 genome (GCA_008728515.1) using the STAR 2-pass aligner [112,113]. Finally, we generated gene-level counts using HTSeq and the Panubis1.0 annotation (GCF_008728515.1) [114].…”
Section: Methodsmentioning
confidence: 99%
“…The genome annotation report and raw files can be found at [ 52 ]. All supporting data and materials are available in the GigaScience GigaDB database [ 53 ].…”
Section: Data Availabilitymentioning
confidence: 99%