The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. These methods work well for lowly heterozygous genomes, but the manifold species have high heterozygosity. Additionally, there are highly divergent regions (HDRs), where the haplotype sequences differ considerably. Because HDRs are likely to direct various interesting biological phenomena, many genomic analysis targets fall within these regions. However, they cannot be accessed by existing phasing methods, and we have to adopt costly traditional methods. Here, we develop a de novo haplotype assembler, Platanus-allee ( http://platanus.bio.titech.ac.jp/platanus2 ), which initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. A comprehensive benchmark analysis reveals that Platanus-allee exhibits high recall and precision, particularly for HDRs. Using this approach, previously unknown HDRs are detected in the human genome, which may uncover novel aspects of genome variability.
Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP .
Serratia marcescens is an important nosocomial pathogen causing various opportunistic infections, such as urinary tract infections, bacteremia and sometimes even hospital outbreaks. The recent emergence and spread of multidrug-resistant (MDR) strains further pose serious threats to global public health. This bacterium is also ubiquitously found in natural environments, but the genomic differences between clinical and environmental isolates are not clear, including those between S. marcescens and its close relatives. In this study, we performed a large-scale genome analysis of S. marcescens and closely related species (referred to as the ‘ S. marcescens complex’), including more than 200 clinical and environmental strains newly sequenced here. Our analysis revealed their phylogenetic relationships and complex global population structure, comprising 14 clades, which were defined based on whole-genome average nucleotide identity. Clades 10, 11, 12 and 13 corresponded to S. nematodiphila , S. marcescens sensu stricto, S. ureilytica and S. surfactantfaciens, respectively. Several clades exhibited distinct genome sizes and GC contents and a negative correlation of these genomic parameters was observed in each clade, which was associated with the acquisition of mobile genetic elements (MGEs), but different types of MGEs, plasmids or prophages (and other integrative elements), were found to contribute to the generation of these genomic variations. Importantly, clades 1 and 2 mostly comprised clinical or hospital environment isolates and accumulated a wide range of antimicrobial resistance genes, including various extended-spectrum β-lactamase and carbapenemase genes, and fluoroquinolone target site mutations, leading to a high proportion of MDR strains. This finding suggests that clades 1 and 2 represent hospital-adapted lineages in the S. marcescens complex although their potential virulence is currently unknown. These data provide an important genomic basis for reconsidering the classification of this group of bacteria and reveal novel insights into their evolution, biology and differential importance in clinical settings.
De novo assembly of short DNA reads remains an essential technology, especially for large-scale projects and high-resolution variant analyses in epidemiology. However, the existing tools often lack sufficient accuracy required to compare closely related strains. To facilitate such studies on bacterial genomes, we developed Platanus_B, a de novo assembler that employs iterations of multiple error-removal algorithms. The benchmarks demonstrated the superior accuracy and high contiguity of Platanus_B, in addition to its ability to enhance the hybrid assembly of both short and nanopore long reads. Although the hybrid strategies for short and long reads were effective in achieving near full-length genomes, we found that short read-only assemblies generated with Platanus_B were sufficient to obtain ≥90% of exact coding sequences in most cases. In addition, while nanopore long read-only assemblies lacked fine-scale accuracies, inclusion of short reads was effective in improving the accuracies. Platanus_B can, therefore, be used for comprehensive genomic surveillances of bacterial pathogens and high-resolution phylogenomic analyses of a wide range of bacteria.
AIMThe single nucleotide polymorphism (SNP) c.415C>T in exon 3 of NUDT15 affects thiopurine-induced leukopenia in Asian patients with Crohn’s disease. Meanwhile, three additional genetic variants of NUDT15 were reported in patients with acute lymphoblastic leukemia. We evaluated the effects of these additional genetic variants of NUDT15 in patients with inflammatory bowel disease (IBD) treated with thiopurines.METHODSNinety-six Japanese patients with IBD were enrolled. Genotyping for the NUDT15 and TPMT genes was performed using Custom TaqMan SNP genotyping assays or Sanger sequencing. The changes in white blood cell (WBC) count, mean corpuscular volume (MCV), platelet count, hemoglobin, CRP, amylase, albumin, AST, ALT, and ESR were evaluated.RESULTSGenetic variants of exon 1 and exon 3 of NUDT15 were identified in 24 of 96 patients (25.0%). C.52G > A and c.36_37insGGAGTC in exon 1 were found in three patients each. All three patients with c.36_37insGGAGTC in exon 1 were heterozygotes of p.Arg139Cys in exon 3. Eighteen patients had p.Arg139Cys in exon 3 alone. The WBC count gradually decreased after initiation of thiopurine treatment in the mutated cases (n = 24), and was significantly lower at 6, 8, 10, and 16 wk (P = 0.0271, 0.0037, 0.0051, and 0.0185, respectively). The WBC counts were also evaluated in patients with and without prednisolone treatment. In the patients with prednisolone treatment, the WBC count tended to show a greater decrease in the mutated cases, with significant differences at 8 and 10 wk (P = 0.012 and 0.029, respectively). In the patients without prednisolone treatment, the WBC count was significantly lower at 2, 4, 8, and 14 wk in mutated cases (P = 0.0196, 0.0182, 0.0237 and 0.0241, respectively). MCV increased after starting thiopurine treatment in the mutated cases, and was significantly higher at 10 wk (P = 0.0085). Platelet count, hemoglobin, CRP, amylase, albumin, AST, ALT and ESR did not differ significantly between the wild-type and mutated cases. TPMT mutations were not found in any of the patients.CONCLUSIONMutations in exon 1 of NUDT15 also affect thiopurine-induced leukopenia in patients with IBD. To discuss thiopurine-induced leukopenia in more detail, investigation of SNPs in both exon 1 and exon 3 of NUDT15 is needed.
Rickettsiae are obligate intracellular bacteria that have small genomes as a result of reductive evolution. Many Rickettsia species of the spotted fever group (SFG) cause tick-borne diseases known as “spotted fevers”. The life cycle of SFG rickettsiae is closely associated with that of the tick, which is generally thought to act as a bacterial vector and reservoir that maintains the bacterium through transstadial and transovarial transmission. Each SFG member is thought to have adapted to a specific tick species, thus restricting the bacterial distribution to a relatively limited geographic region. These unique features of SFG rickettsiae allow investigation of how the genomes of such biologically and ecologically specialized bacteria evolve after genome reduction and the types of population structures that are generated. Here, we performed a nationwide, high-resolution phylogenetic analysis of Rickettsia japonica, an etiological agent of Japanese spotted fever that is distributed in Japan and Korea. The comparison of complete or nearly complete sequences obtained from 31 R. japonica strains isolated from various sources in Japan over the past 30 years demonstrated an extremely low level of genomic diversity. In particular, only 34 single nucleotide polymorphisms were identified among the 27 strains of the major lineage containing all clinical isolates and tick isolates from the three tick species. Our data provide novel insights into the biology and genome evolution of R. japonica, including the possibilities of recent clonal expansion and a long generation time in nature due to the long dormant phase associated with tick life cycles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.