Genotyping methods and genome sequencing are indispensable to reveal genomic structure of bacterial species displaying high level of genome plasticity. However, reconstruction of genome or assembly is not straightforward due to data complexity, including repeats, mobile and accessory genetic elements of bacterial genomes. Moreover, since the solution to this problem is strongly influenced by sequencing technology, bioinformatics pipelines, and selection criteria to assess assemblers, there is no systematic way to select a priori the optimal assembler and parameter settings. to assembly the genome of Pseudomonas aeruginosa strain AG1 (PaeAG1), short reads (Illumina) and long reads (Oxford Nanopore) sequencing data were used in 13 different non-hybrid and hybrid approaches. PaeAG1 is a multiresistant high-risk sequence type 111 (ST-111) clone that was isolated from a Costa Rican hospital and it was the first report of an isolate of P. aeruginosa carrying both blaVIM-2 and blaIMP-18 genes encoding for metallo-β-lactamases (MBL) enzymes. To assess the assemblies, multiple metrics regard to contiguity, correctness and completeness (3C criterion, as we define here) were used for benchmarking the 13 approaches and select a definitive assembly. In addition, annotation was done to identify genes (coding and RNA regions) and to describe the genomic content of PaeAG1. Whereas long reads and hybrid approaches showed better performances in terms of contiguity, higher correctness and completeness metrics were obtained for short read only and hybrid approaches. A manually curated and polished hybrid assembly gave rise to a single circular sequence with 100% of core genes and known regions identified, >98% of reads mapped back, no gaps, and uniform coverage. The strategy followed to obtain this high-quality 3C assembly is detailed in the manuscript and we provide readers with an all-in-one script to replicate our results or to apply it to other troublesome cases. The final 3C assembly revealed that the PaeAG1 genome has 7,190,208 bp, a 65.7% GC content and 6,709 genes (6,620 coding sequences), many of which are included in multiple mobile genomic elements, such as 57 genomic islands, six prophages, and two complete integrons with blaVIM-2 and blaIMP-18 MBL genes. Up to 250 and 60 of the predicted genes are anticipated to play a role in virulence (adherence, quorum sensing and secretion) or antibiotic resistance (β-lactamases, efflux pumps, etc). Altogether, the assembly and annotation of the PaeAG1 genome provide new perspectives to continue studying the genomic diversity and gene content of this important human pathogen. Genotyping methods and genome sequencing are indispensable to reveal genomic structure and evolution of bacterial clones with high resolution 1. In this sense, production of large amounts of short sequencing data from genomes (reads) has been facilitated by continuous advances in Next Generation Sequencing (NGS) technologies.
Genome sequencing is a key strategy in the surveillance of SARS-CoV-2, the virus responsible for the COVID-19 pandemic. Latin America is the hardest-hit region of the world, accumulating almost 20% of COVID-19 cases worldwide. In Costa Rica, from the first detected case on March 6th to December 31st almost 170,000 cases have been reported. We analyzed the genomic variability during the SARS-CoV-2 pandemic in Costa Rica using 185 sequences, 52 from the first months of the pandemic, and 133 from the current wave. Three GISAID clades (G, GH, and GR) and three PANGOLIN lineages (B.1, B.1.1, and B.1.291) were predominant, suggesting multiple re-introductions from other regions. The whole-genome variant calling analysis identified a total of 283 distinct nucleotide variants, following a power-law distribution with 190 single nucleotide mutations in a single sequence, and only 16 mutations were found in >5% sequences. These mutations were distributed through the whole genome. The prevalence of worldwide-found variant D614G in the Spike (98.9% in Costa Rica), ORF8 L84S (1.1%) is similar to what is found elsewhere. Interestingly, the frequency of mutation T1117I in the Spike has increased during the current pandemic wave beginning in May 2020 in Costa Rica, reaching 29.2% detection in the full genome analyses in November 2020. This variant has been observed in less than 1% of the GISAID reported sequences worldwide in 2020. Structural modeling of the Spike protein with the T1117I mutation suggests a potential effect on the viral oligomerization needed for cell infection, but no differences with other genomes on transmissibility, severity nor vaccine effectiveness are predicted. In conclusion, genome analyses of the SARS-CoV-2 sequences over the course of the COVID-19 pandemic in Costa Rica suggest the introduction of lineages from other countries and the detection of mutations in line with other studies, but pointing out the local increase in the detection of Spike-T1117I variant. The genomic features of this virus need to be monitored and studied in further analyses as part of the surveillance program during the pandemic.
Concomitant infection or co-infection with distinct SARS-CoV-2 genotypes has been reported as part of the epidemiological surveillance of the COVID-19 pandemic. In the context of the spread of more transmissible variants during 2021, co-infections are not only important due to the possible changes in the clinical outcome, but also the chance to generate new genotypes by recombination. However, a few approaches have developed bioinformatic pipelines to identify co-infections. Here we present a metagenomic pipeline based on the inference of multiple fragments similar to amplicon sequence variant (ASV-like) from sequencing data and a custom SARS-CoV-2 database to identify the concomitant presence of divergent SARS-CoV-2 genomes, i.e., variants of concern (VOCs). This approach was compared to another strategy based on whole-genome (metagenome) assembly. Using single or pairs of sequencing data of COVID-19 cases with distinct SARS-CoV-2 VOCs, each approach was used to predict the VOC classes (Alpha, Beta, Gamma, Delta, Omicron or non-VOC and their combinations). The performance of each pipeline was assessed using the ground-truth or expected VOC classes. Subsequently, the ASV-like pipeline was used to analyze 1021 cases of COVID-19 from Costa Rica to investigate the possible occurrence of co-infections. After the implementation of the two approaches, an accuracy of 96.2% was revealed for the ASV-like inference approach, which contrasts with the misclassification found (accuracy 46.2%) for the whole-genome assembly strategy. The custom SARS-CoV-2 database used for the ASV-like analysis can be updated according to the appearance of new VOCs to track co-infections with eventual new genotypes. In addition, the application of the ASV-like approach to all the 1021 sequenced samples from Costa Rica in the period October 12th–December 21th 2021 found that none corresponded to co-infections with VOCs. In conclusion, we developed a metagenomic pipeline based on ASV-like inference for the identification of co-infection with distinct SARS-CoV-2 VOCs, in which an outstanding accuracy was achieved. Due to the epidemiological, clinical, and molecular relevance of the concomitant infection with distinct genotypes, this work represents another piece in the process of the surveillance of the COVID-19 pandemic in Costa Rica and worldwide.
Chronic gastrointestinal (GI) diseases are the most common diseases in captive common marmosets. To understand the role of the microbiome in GI diseases, we characterized the gut microbiome of 91 healthy marmosets (303 samples) and 59 marmosets diagnosed with inflammatory bowel disease (IBD) (200 samples). Healthy marmosets exhibited “humanized,” Bacteroidetes-dominant microbiomes. After up to 2 years of standardized diet, housing and husbandry, marmoset microbiomes could be classified into four distinct marmoset sources based on Prevotella and Bacteroides levels. Using a random forest (RF) model, marmosets were classified by source with an accuracy of 93% with 100% sensitivity and 95% specificity using abundance data from 4 Prevotellaceae amplicon sequence variants (ASVs), as well as single ASVs from Coprobacter, Parabacteroides, Paraprevotella, Phascolarctobacterium, Oribacterium and Fusobacterium. A single dysbiotic IBD state was not found across all marmoset sources, but IBD was associated with lower alpha diversity and a lower Bacteroides:Prevotella copri ratio within each source. IBD was highest in a Prevotella-dominant cohort, and consistent with Prevotella-linked diseases, pro-inflammatory genes in the jejunum were upregulated. RF analysis of serum biomarkers identified serum calcium, hemoglobin and red blood cell (RBC) counts as potential biomarkers for marmoset IBD. This study characterizes the microbiome of healthy captive common marmosets and demonstrates that source-specific microbiomes can be retained despite standardized diets and husbandry practices. Marmosets with IBD had decreased alpha diversity and a shift in the ratio of Bacteroides:Prevotella copri compared to healthy marmosets.
Emerging mutations and genotypes of the SARS-CoV-2 virus, responsible for the COVID-19 pandemic, have been reported globally. In Costa Rica during the year 2020, a predominant genotype carrying the mutation T1117I in the spike (S:T1117I) was previously identified. To investigate the possible effects of this mutation on the function of the spike, i.e. the biology of the virus, different bioinformatic pipelines based on phylogeny, natural selection, and co-evolutionary models, molecular docking, and epitopes prediction were implemented. Results of the phylogeny of sequences carrying the S:T1117I worldwide showed a polyphyletic group, with the emergence of local lineages. In Costa Rica, the mutation is found in the lineage B.1.1.389 and it is suggested to be a product of positive/adaptive selection. Different changes in the function of the spike protein and more stable interaction with a ligand (nelfinavir drug) were found. Only one epitope out 742 in the spike was affected by the mutation, with some different properties, but suggesting scarce changes in the immune response and no influence on the vaccine effectiveness. Jointly, these results suggest a partial benefit of the mutation for the spread of the virus with this genotype during the year 2020 in Costa Rica, although possibly not strong enough with the introduction of new lineages during early 2021 which became predominant later. In addition, the bioinformatic analyses used here can be applied as an in silico strategy to eventually study other mutations of interest for the SARS-CoV-2 virus and other pathogens.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.