Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.
Genome-based phylogeny plays a central role in the future taxonomy and phylogenetics of Bacteria and Archaea by replacing 16S rRNA gene phylogeny. The concatenated core gene alignments are frequently used for such a purpose. The bacterial core genes are defined as single-copy, homologous genes that are present in most of the known bacterial species. There have been several studies describing such a gene set, but the number of species considered was rather small. Here we present the up-to-date bacterial core gene set, named UBCG, and software suites to accommodate necessary steps to generate and evaluate phylogenetic trees. The method was successfully used to infer phylogenomic relationship of Escherichia and related taxa and can be used for the set of genomes at any taxonomic ranks of Bacteria. The UBCG pipeline and file viewer are freely available at https://www.ezbiocloud.net/tools/ubcg and https://www.ezbiocloud.net/tools/ubcg_viewer , respectively.
For more than a decade, pan-genome analysis has been applied as an effective method for explaining the genetic contents variation of prokaryotic species. However, genomic characteristics and detailed structures of gene pools have not been fully clarified, because most studies have used a small number of genomes. Here, we constructed pan-genomes of seven species in order to elucidate variations in the genetic contents of >27,000 genomes belonging to Streptococcus pneumoniae , Staphylococcus aureus subsp. aureus , Salmonella enterica subsp. enterica , Escherichia coli and Shigella spp., Mycobacterium tuberculosis complex, Pseudomonas aeruginosa , and Acinetobacter baumannii. This work showed the pan-genomes of all seven species has open property. Additionally, systematic evaluation of the characteristics of their pan-genome revealed that phylogenetic distance provided valuable information for estimating the parameters for pan-genome size among several models including Heaps’ law. Our results provide a better understanding of the species and a solution to minimize sampling biases associated with genome-sequencing preferences for pathogenic strains.
Shotgun metagenomics is of great importance in order to understand the composition of the microbial community associated with a sample and the potential impact it may exert on its host. For clinical metagenomics, one of the initial challenges is the accurate identification of a pathogen of interest and ability to single out that pathogen within a complex community of microorganisms. However, in absence of an accurate identification of those microorganisms, any kind of conclusion or diagnosis based on misidentification may lead to erroneous conclusions, especially when comparing distinct groups of individuals. When comparing a shotgun metagenomic sample against a reference genome sequence database, the classification itself is dependent on the contents of the database. Focusing on the genus Streptococcus, we built four synthetic metagenomic samples and demonstrated that shotgun taxonomic profiling using the bacterial core genes as the reference database performed better in both taxonomic profiling and relative abundance prediction than that based on the marker gene reference database included in MetaPhlAn2. Additionally, by classifying sputum samples of patients suffering from chronic obstructive pulmonary disease, we showed that adding genomes of genomospecies to a reference database offers higher taxonomic resolution for taxonomic profiling. Finally, we show how our genomospecies database is able to identify correctly a clinical stool sample from a patient with a streptococcal infection, proving that genomospecies provide better taxonomic coverage for metagenomic analyses.
RationaleThe global public health is in serious crisis due to emergence of SARS-CoV-2 virus. Studies are ongoing to reveal the genomic variants of the virus circulating in various parts of the world. However, data generated from low- and middle-income countries are scarce due to resource limitation. This study was focused to perform whole genome sequencing of 151 SARS-CoV-2 isolates from COVID-19 positive Bangladeshi patients. The goal of this study was to identify the genomic variants among the SARS-CoV-2 virus isolates in Bangladesh, to determine the molecular epidemiology and to develop a relationship between host clinical trait with the virus genomic variants.MethodSuspected patients were tested for COVID-19 using one step commercial qPCR kit for SARS-CoV-2 Virus. Viral RNA was extracted from positive patients, converted to cDNA which was amplified using Ion AmpliSeq™ SARS-CoV-2 Research Panel. Massive parallel sequencing was carried out using Ion AmpliSeq™ Library Kit Plus. Assembly of raw data is done by aligning the reads to a pre-defined reference genome (NC_045512.2) while retaining the unique variations of the input raw data by creating a consensus genome. A random forest-based association analysis was carried out to correlate the viral genomic variants with the clinical traits present in the host.ResultAmong the 151 viral isolates, we observed the 413 unique variants. Among these 8 variants occurred in more than 80 % of cases which include 241C to T, 1163A to T, 3037C to T,14408C to T, 23403A to G, 28881G to A, 28882 G to A, and finally the 28883G to C. Phylogenetic analysis revealed a predominance of variants belonging to GR clade, which have a strong geographical presence in Europe, indicating possible introduction of the SARS-CoV-2 virus into Bangladesh through a European channel. However, other possibilities like a route of entry from China cannot be ruled out as viral isolate belonging to L clade with a close relationship to Wuhan reference genome was also detected. We observed a total of 37 genomic variants to be strongly associated with clinical symptoms such as fever, sore throat, overall symptomatic status, etc. (Fisher’s Exact Test p-value<0.05). The most mention-worthy among those were the 3916CtoT (associated with causing sore throat, p-value 0.0005), the 14408C to T (associated with protection from developing cough, p-value= 0.027), and the 28881G to A, 28882G to A, and 28883G to C variant (associated with causing chest pain, p-value 0.025).ConclusionTo our knowledge, this study is the first large scale phylogenomic studies of SARS-CoV-2 virus circulating in Bangladesh. The observed epidemiological and genomic features may inform future research platform for disease management, vaccine development and epidemiological study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.