RegulonDB, first published 20 years ago, is a comprehensive electronic resource about regulation of transcription initiation of Escherichia coli K-12 with decades of knowledge from classic molecular biology experiments, and recently also from high-throughput genomic methodologies. We curated the literature to keep RegulonDB up to date, and initiated curation of ChIP and gSELEX experiments. We estimate that current knowledge describes between 10% and 30% of the expected total number of transcription factor- gene regulatory interactions in E. coli. RegulonDB provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. We developed a proof of concept pipeline to merge binding and expression evidence to identify regulatory interactions. These datasets can be visualized in the RegulonDB JBrowse. We developed the Microbial Conditions Ontology with a controlled vocabulary for the minimal properties to reproduce an experiment, which contributes to integrate data from high throughput and classic literature. At a higher level of integration, we report Genetic Sensory-Response Units for 200 transcription factors, including their regulation at the metabolic level, and include summaries for 70 of them. Finally, we summarize our research with Natural language processing strategies to enhance our biocuration work.
Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.
The SARS-CoV-2 pandemic is one of the most concerning health problems around the globe. We reported the emergence of SARS-CoV-2 variant B.1.1.519 in Mexico City. We reported the effective reproduction number (Rt) of B.1.1.519 and presented evidence of its geographical origin based on phylogenetic analysis. We also studied its evolution via haplotype analysis and identified the most recurrent haplotypes. Finally, we studied the clinical impact of B.1.1.519. The B.1.1.519 variant was predominant between November 2020 and May 2021, reaching 90% of all cases sequenced in February 2021. It is characterized by three amino acid changes in the spike protein: T478K, P681H, and T732A. Its Rt varies between 0.5 and 2.9. Its geographical origin remain to be investigated. Patients infected with variant B.1.1.519 showed a highly significant adjusted odds ratio (aOR) increase of 1.85 over non-B.1.1.519 patients for developing a severe/critical outcome (p = 0.000296, 1.33–2.6 95% CI) and a 2.35-fold increase for hospitalization (p = 0.005, 1.32–4.34 95% CI). The continuous monitoring of this and other variants will be required to control the ongoing pandemic as it evolves.
The SARS-CoV-2 pandemic is one of the most concerning health problems around the globe. We report the emergence of SARS-CoV-2 variant B.1.1.519 in Mexico City. This variant represented up to 90% of sequenced cases in February 2021. It is characterized by three amino acid changes in the spike protein: T478K, P681H, and T732A. We report the effective reproduction number of B.1.1.519 and present evidence of its geographical origin based on phylogenetic analysis. We also studied its evolution via haplotype analysis and identified the most recurrent haplotypes. Finally, we studied the clinical impact of B.1.1.519: patients infected with variant B.1.1.519 showed a highly significant adjusted odds ratio (aOR) increase of 1.85 over non-B.1.1.519 patients for developing a severe/critical outcome (P = 0.000296, 1.33-2.6 95% CI) and a 2.35-fold increase for hospitalization (P = 0.005, 1.32-4.34 95% CI). The continuous monitoring of this and other variants will be required to control the ongoing pandemic as it evolves.
Studies have suggested a potential role of somatic mitochondrial mutations in cancer development. To analyze the landscape of somatic mitochondrial mutation in breast cancer and to determine whether mitochondrial DNA (mtDNA) mutational burden is correlated with overall survival (OS), we sequenced whole mtDNA from 92 matchedpaired primary breast tumors and peripheral blood. A total of 324 germline variants and 173 somatic mutations were found in the tumors. The most common germline allele was 663G (12S), showing lower heteroplasmy levels in peripheral blood lymphocytes than in their matched tumors, even reaching homoplasmic status in several cases. The heteroplasmy load was higher in tumors than in their paired normal tissues. Somatic mtDNA mutations were found in 73.9% of breast tumors; 59% of these mutations were located in the coding region (66.7% non-synonymous and 33.3% synonymous). Although the CO1 gene presented the highest number of mutations, tRNA genes (T, C, and W), rRNA 12S, and CO1 and ATP6 exhibited the highest mutation rates. No specific mtDNA mutational profile was associated with molecular subtypes of breast cancer, and we found no correlation between mtDNA mutational burden and OS. Future investigations will provide insight into the molecular mechanisms through which mtDNA mutations and heteroplasmy shifting contribute to breast cancer development.
At the heart of genomics lies the precise determination of an organism’s DNA sequence. Palacios-Flores et al. present a simple, sensitive, precise, and essentially non-statistical solution for generating genome-wide variation profiles and refining reference genomes...
Omicron is the most mutated SARS-CoV-2 variant—a factor that can affect transmissibility, disease severity, and immune evasiveness. Its genomic surveillance is important in cities with millions of inhabitants and an economic center, such as Mexico City. Results. From 16 November to 31 December 2021, we observed an increase of 88% in Omicron prevalence in Mexico City. We explored the R346K substitution, prevalent in 42% of Omicron variants, known to be associated with immune escape by monoclonal antibodies. In a phylogenetic analysis, we found several independent exchanges between Mexico and the world, and there was an event followed by local transmission that gave rise to most of the Omicron diversity in Mexico City. A haplotype analysis revealed that there was no association between haplotype and vaccination status. Among the 66% of patients who have been vaccinated, no reported comorbidities were associated with Omicron; the presence of odynophagia and the absence of dysgeusia were significant predictor symptoms for Omicron, and the RT-qPCR Ct values were lower for Omicron. Conclusions. Genomic surveillance is key to detecting the emergence and spread of SARS-CoV-2 variants in a timely manner, even weeks before the onset of an infection wave, and can inform public health decisions and detect the spread of any mutation that may affect therapeutic efficacy.
SignificanceThe precise location of variants in the human genome is of utmost importance. We present a unique approach, coverage-based single nucleotide variant (SNV) identification (COBASI), which uses only perfect matches between the reads of a sequence project and a reference genome to detect and accurately identify de novo SNVs. From the perfect matches, a representation of the read coverage per nucleotide along the genome, the variation landscape, is generated. SNVs are then pinpointed as significant changes in coverage and de novo SNVs can be identified with high precision. The performance of COBASI was analyzed using simulations and experimentally validated by sequencing de novo SNVs identified from a parent–offspring trio. We propose this pipeline as a useful tool for different genomic applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.