2018
DOI: 10.1093/nar/gky726
|View full text |Cite
|
Sign up to set email alerts
|

Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats

Abstract: Generating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length, which contained several genes that may confer fitness advantages to the strain. Its complex genome, which also included a variable shufflon region, could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from Oxfo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
87
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 86 publications
(90 citation statements)
references
References 62 publications
3
87
0
Order By: Relevance
“…The association of key adaptive traits with duplicated regions has also been reported for the p1 megaplasmid-carrying Pseudomonas koreensis strain P19E3 (64) , with genes encoding heavy metal resistance, aromatic compound degradation, and DNA repair occurring within large repeats in the genome. Consistent with our observations, repetitive genes in p1 were identified close to transposase and phage genes suggesting a key role for mobile elements like transposons in shaping these large genome duplications (64) .…”
Section: Discussionsupporting
confidence: 90%
See 2 more Smart Citations
“…The association of key adaptive traits with duplicated regions has also been reported for the p1 megaplasmid-carrying Pseudomonas koreensis strain P19E3 (64) , with genes encoding heavy metal resistance, aromatic compound degradation, and DNA repair occurring within large repeats in the genome. Consistent with our observations, repetitive genes in p1 were identified close to transposase and phage genes suggesting a key role for mobile elements like transposons in shaping these large genome duplications (64) .…”
Section: Discussionsupporting
confidence: 90%
“…The pRB16 megaplasmid was identified in P. citronellolis SJTE-3, an estrogen and polycyclic aromatic hydrocarbon degrading bacterium isolated from active sludge at a wastewater treatment plant in China (75) . P. koreensis P19E3, harbouring the megaplasmid p1, was isolated from healthy marjoram ( Origanum marjorana ) leaf material during an isolation survey on an organic herb farm (Boppelsen, Switzerland) in 2014 (64) . The presence on p1 of a cluster of genes encoding copper resistance suggests that whilst members of the megaplasmid family share a common backbone, they are flexible in the adaptive traits that they carry.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…An analysis of the repeat complexity of all publicly available, completely sequenced genomes of L. monocytogenes strains (status: March, 2018) revealed that almost 95% of the roughly 150 strains are so-called "class I" genomes, which are straightforward to assemble (few repeats, none longer than the rDNA operons of up to 7 kb) [17]. In contrast, eight strains also had few repeats overall, but those present were up to 11 kb in length [17].…”
Section: Complete Genome Sequences Of Egd-e and Scotta And Comparativmentioning
confidence: 99%
“…An analysis of the repeat complexity of all publicly available, completely sequenced genomes of L. monocytogenes strains (status: March, 2018) revealed that almost 95% of the roughly 150 strains are so-called "class I" genomes, which are straightforward to assemble (few repeats, none longer than the rDNA operons of up to 7 kb) [17]. In contrast, eight strains also had few repeats overall, but those present were up to 11 kb in length [17]. Using long-read Pacific Biosciences (PacBio) sequencing data (including a BluePippin size selection step; see Materials and Methods) and the assembly algorithm HGAP3 [50], we were able to de novo assemble one complete chromosome for EGD-e (2.94 Mbp) and one for ScottA (3.03 Mbp) with PacBio read coverages of 260x and 280x, respectively (Fig 2).…”
Section: Complete Genome Sequences Of Egd-e and Scotta And Comparativmentioning
confidence: 99%