The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis.
OGEE is an Online GEne Essentiality database. Gene essentiality is not a static and binary property, rather a context-dependent and evolvable property in all forms of life. In OGEE we collect not only experimentally tested essential and non-essential genes, but also associated gene properties that contributes to gene essentiality. We tagged conditionally essential genes that show variable essentiality statuses across datasets to highlight complex interplays between gene functions and environmental/experimental perturbations. OGEE v3 contains gene essentiality datasets for 91 species; almost doubled from 48 species in previous version. To accommodate recent advances on human cancer essential genes (as known as tumor dependency genes) that could serve as targets for cancer treatment and/or drug development, we expanded the collection of human essential genes from 16 cell lines in previous to 581. These human cancer cell lines were tested with high-throughput experiments such as CRISPR-Cas9 and RNAi; in total, 150 of which were tested by both techniques. We also included factors known to contribute to gene essentiality for these cell lines, such as genomic mutation, methylation and gene expression, along with extensive graphical visualizations for ease of understanding of these factors. OGEE v3 can be accessible freely at https://v3.ogee.info.
The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Characterizing genes that are critical for the survival of an organism (i.e. essential) is important to gain a deep understanding of the fundamental cellular and molecular mechanisms that sustain life. Functional genomic investigations of the vinegar fly, Drosophila melanogaster, have unravelled the functions of numerous genes of this model species, but results from phenomic experiments can sometimes be ambiguous. Moreover, the features underlying gene essentiality are poorly understood, posing challenges for computational prediction. Here, we harnessed comprehensive genomic-phenomic datasets publicly available for D. melanogaster and a machine-learning-based workflow to predict essential genes of this fly. We discovered strong predictors of such genes, paving the way for computational predictions of essentiality in less-studied arthropod pests and vectors of infectious diseases.
Sequencing Multiple Brazilian Acinetobacter Genomes genes associated with the synthesis of the capsular antigens were noticeably more variable in the ST113 and ST79 strains. Indeed, several resistance and virulence genes were common to the ST79 and ST113 strains only, despite a greater genetic distance between them, suggesting common means of genetic exchange. Our comparative analysis reveals the spread of multiple STs and the genomic plasticity of A. baumannii from different hospitals in a single metropolitan area. It also highlights differences in the spread of resistance markers and other MGEs between the investigated STs, impacting on the monitoring and treatment of Acinetobacter in the ongoing and future outbreaks.
The rapid worldwide spread of chikungunya (CHIKV), dengue (DENV), and Zika (ZIKV) viruses have raised great international concern. Knowledge about the entry routes and geographic expansion of these arboviruses to the mainland Americas remain incomplete and controversial. Epidemics caused by arboviruses continue to cause socioeconomic burden globally, particularly in countries where vector control is difficult due to climatic or infrastructure factors. Understanding how the virus circulates and moves from one country to another is of paramount importance to assist government and health officials in anticipating future epidemics, as well as to take steps to help control or mitigate the spread of the virus. Through the analyses of the sequences of arbovirus genomes collected at different locations over time, we identified patterns of accumulated mutations, being able to trace routes of dispersion of these viruses. Here, we applied robust phylogenomic methods to trace the evolutionary dynamics of these arboviruses with special focus on Brazil, the epicenter of these triple epidemics. Our results show that CHIKV, DENV-1–4, and ZIKV followed a similar path prior to their first introductions into the mainland Americas, underscoring the need for systematic arboviral surveillance at major entry points of human population movement between countries such as airports and seaports.
Chikungunya virus (CHIKV) is an RNA virus from the Togaviridae family transmitted by mosquitoes in both sylvatic and urban cycles. In humans, CHIKV infection leads to a febrile illness, denominated Chikungunya fever (CHIKF), commonly associated with more intense and debilitating outcomes. CHIKV arrived in Brazil in 2014 through two independent introductions: the Asian/Caribbean genotype entered through the North region and the African ECSA genotype was imported through the Northeast region. Following their initial introduction, both genotypes established their urban cycle among large naive human populations causing several outbreaks in the Americas. Here, we sequenced CHIKV genomes from a recent outbreak in the Northeast region of Brazil, employing an in-house developed Next-Generation Sequencing (NGS) protocol capable of directly detecting multiple known CHIKV genotypes from clinical positive samples. Our results demonstrate that both Asian/Caribbean and ECSA genotypes expanded their ranges, reaching cocirculation in the Northeast region of Brazil. In addition, our NGS data supports the findings of simultaneous infection by these two genotypes, suggesting that coinfection might be more common than previously thought in highly endemic areas. Future efforts to understand CHIKV epidemiology should thus take into consideration the possibility of coinfection by different genotypes in the human population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.