Yeong Ouk Kim scite author profile

Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.

show abstract

UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction

Kim

Yoon³

et al. 2018

J Microbiol.

1,036

852

View full text Add to dashboard Cite

Genome-based phylogeny plays a central role in the future taxonomy and phylogenetics of Bacteria and Archaea by replacing 16S rRNA gene phylogeny. The concatenated core gene alignments are frequently used for such a purpose. The bacterial core genes are defined as single-copy, homologous genes that are present in most of the known bacterial species. There have been several studies describing such a gene set, but the number of species considered was rather small. Here we present the up-to-date bacterial core gene set, named UBCG, and software suites to accommodate necessary steps to generate and evaluate phylogenetic trees. The method was successfully used to infer phylogenomic relationship of Escherichia and related taxa and can be used for the set of genomes at any taxonomic ranks of Bacteria. The UBCG pipeline and file viewer are freely available at https://www.ezbiocloud.net/tools/ubcg and https://www.ezbiocloud.net/tools/ubcg_viewer , respectively.

show abstract

Large-Scale Genomics Reveals the Genetic Characteristics of Seven Species and Importance of Phylogenetic Distance for Estimating Pan-Genome Size

et al. 2019

View full text Add to dashboard Cite

For more than a decade, pan-genome analysis has been applied as an effective method for explaining the genetic contents variation of prokaryotic species. However, genomic characteristics and detailed structures of gene pools have not been fully clarified, because most studies have used a small number of genomes. Here, we constructed pan-genomes of seven species in order to elucidate variations in the genetic contents of >27,000 genomes belonging to Streptococcus pneumoniae , Staphylococcus aureus subsp. aureus , Salmonella enterica subsp. enterica , Escherichia coli and Shigella spp., Mycobacterium tuberculosis complex, Pseudomonas aeruginosa , and Acinetobacter baumannii. This work showed the pan-genomes of all seven species has open property. Additionally, systematic evaluation of the characteristics of their pan-genome revealed that phylogenetic distance provided valuable information for estimating the parameters for pan-genome size among several models including Heaps’ law. Our results provide a better understanding of the species and a solution to minimize sampling biases associated with genome-sequencing preferences for pathogenic strains.

show abstract

Improved Metagenomic Taxonomic Profiling Using a Curated Core Gene-Based Bacterial Database Reveals Unrecognized Species in the Genus Streptococcus

Chalita

Ha²,

Kim

et al. 2020

Pathogens

View full text Add to dashboard Cite

Shotgun metagenomics is of great importance in order to understand the composition of the microbial community associated with a sample and the potential impact it may exert on its host. For clinical metagenomics, one of the initial challenges is the accurate identification of a pathogen of interest and ability to single out that pathogen within a complex community of microorganisms. However, in absence of an accurate identification of those microorganisms, any kind of conclusion or diagnosis based on misidentification may lead to erroneous conclusions, especially when comparing distinct groups of individuals. When comparing a shotgun metagenomic sample against a reference genome sequence database, the classification itself is dependent on the contents of the database. Focusing on the genus Streptococcus, we built four synthetic metagenomic samples and demonstrated that shotgun taxonomic profiling using the bacterial core genes as the reference database performed better in both taxonomic profiling and relative abundance prediction than that based on the marker gene reference database included in MetaPhlAn2. Additionally, by classifying sputum samples of patients suffering from chronic obstructive pulmonary disease, we showed that adding genomes of genomospecies to a reference database offers higher taxonomic resolution for taxonomic profiling. Finally, we show how our genomospecies database is able to identify correctly a clinical stool sample from a patient with a streptococcal infection, proving that genomospecies provide better taxonomic coverage for metagenomic analyses.

show abstract

Phase transformation of metastable beta Ti alloys with tweed structure

Choe¹,

Shin²,

Kim

et al. 2005

Met. Mater. Int.

View full text Add to dashboard Cite

Large scale genomic and evolutionary study reveals SARS-CoV-2 virus isolates from Bangladesh strongly correlate with European origin and not with China

Rabbi

Khan

Hasan

et al. 2021

Preprint

View full text Add to dashboard Cite

RationaleThe global public health is in serious crisis due to emergence of SARS-CoV-2 virus. Studies are ongoing to reveal the genomic variants of the virus circulating in various parts of the world. However, data generated from low- and middle-income countries are scarce due to resource limitation. This study was focused to perform whole genome sequencing of 151 SARS-CoV-2 isolates from COVID-19 positive Bangladeshi patients. The goal of this study was to identify the genomic variants among the SARS-CoV-2 virus isolates in Bangladesh, to determine the molecular epidemiology and to develop a relationship between host clinical trait with the virus genomic variants.MethodSuspected patients were tested for COVID-19 using one step commercial qPCR kit for SARS-CoV-2 Virus. Viral RNA was extracted from positive patients, converted to cDNA which was amplified using Ion AmpliSeq™ SARS-CoV-2 Research Panel. Massive parallel sequencing was carried out using Ion AmpliSeq™ Library Kit Plus. Assembly of raw data is done by aligning the reads to a pre-defined reference genome (NC_045512.2) while retaining the unique variations of the input raw data by creating a consensus genome. A random forest-based association analysis was carried out to correlate the viral genomic variants with the clinical traits present in the host.ResultAmong the 151 viral isolates, we observed the 413 unique variants. Among these 8 variants occurred in more than 80 % of cases which include 241C to T, 1163A to T, 3037C to T,14408C to T, 23403A to G, 28881G to A, 28882 G to A, and finally the 28883G to C. Phylogenetic analysis revealed a predominance of variants belonging to GR clade, which have a strong geographical presence in Europe, indicating possible introduction of the SARS-CoV-2 virus into Bangladesh through a European channel. However, other possibilities like a route of entry from China cannot be ruled out as viral isolate belonging to L clade with a close relationship to Wuhan reference genome was also detected. We observed a total of 37 genomic variants to be strongly associated with clinical symptoms such as fever, sore throat, overall symptomatic status, etc. (Fisher’s Exact Test p-value<0.05). The most mention-worthy among those were the 3916CtoT (associated with causing sore throat, p-value 0.0005), the 14408C to T (associated with protection from developing cough, p-value= 0.027), and the 28881G to A, 28882G to A, and 28883G to C variant (associated with causing chest pain, p-value 0.025).ConclusionTo our knowledge, this study is the first large scale phylogenomic studies of SARS-CoV-2 virus circulating in Bangladesh. The observed epidemiological and genomic features may inform future research platform for disease management, vaccine development and epidemiological study.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yeong Ouk Kim

OrthoANI: An improved algorithm and software for calculating average nucleotide identity

UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction

Large-Scale Genomics Reveals the Genetic Characteristics of Seven Species and Importance of Phylogenetic Distance for Estimating Pan-Genome Size

Improved Metagenomic Taxonomic Profiling Using a Curated Core Gene-Based Bacterial Database Reveals Unrecognized Species in the Genus Streptococcus

Phase transformation of metastable beta Ti alloys with tweed structure

Large scale genomic and evolutionary study reveals SARS-CoV-2 virus isolates from Bangladesh strongly correlate with European origin and not with China

Contact Info

Product

Resources

About