The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector.
DNA methylation, especially CpG methylation at promoter regions, has been generally considered as a potent epigenetic modification that prohibits transcription factor (TF) recruitment, resulting in transcription suppression. Here, we used a protein microarray-based approach to systematically survey the entire human TF family and found numerous purified TFs with methylated CpG (mCpG)-dependent DNA-binding activities. Interestingly, some TFs exhibit specific binding activity to methylated and unmethylated DNA motifs of distinct sequences. To elucidate the underlying mechanism, we focused on Kruppel-like factor 4 (KLF4), and decoupled its mCpG- and CpG-binding activities via site-directed mutagenesis. Furthermore, KLF4 binds specific methylated or unmethylated motifs in human embryonic stem cells in vivo. Our study suggests that mCpG-dependent TF binding activity is a widespread phenomenon and provides a new framework to understand the role and mechanism of TFs in epigenetic regulation of gene transcription.DOI: http://dx.doi.org/10.7554/eLife.00726.001
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
A pharmacophore and an alignment rule have previously been reported for BzR agonist ligands. The design and synthesis of 6-(propyloxy)-4-(methoxymethyl)-beta-carboline-3-carboxylic acid ethyl ester (6-PBC, 24, IC50 = 8.1 nM) was based on this pharmacophore. When evaluated in vivo this ligand exhibited anticonvulsant/anxiolytic activity but was devoid of the muscle relaxant/ataxic effects of "classical" 1,4-benzodiazepines (i.e., diazepam). Significantly, 6-PBC 24 also reversed diazepam-induced muscle relaxation in mice. The 3-substituted analogues 40-46 and 48 of 6-PBC 24 and Zk 93423 27(IC50 = 1 nM) were synthesized and evaluated in vitro to determine what affect these modifications would have on the binding affinity at recombinant BzR subtypes. With the exception of the 3-amino ligands 40 and 41, all the beta-carbolines were found to exhibit high binding affinity at BzR sites. The 3-propyl ether derivative 45 was also evaluated in vivo and found to be devoid of any proconvulsant or anticonvulsant activity at doses up to 40 mg/kg. The 6-(1-naphthylmethyloxy) and 6-octyloxy analogues 25, 26, 28, and 29 of 6-PBC 24 were synthesized to further evaluate the proposed alignment of agonists vs inverse agonists in the pharmacophore of the BzR. In addition, ligands 26 and 29 were designed to probe the dimensions of lipophilic pocket L3 at the agonist site. The activity of 29 was evaluated in vivo; however, this analogue elicited no pharmacological effects at doses up to 80 mg/kg. These and other related beta-carbolines were also examined in five recombinant GABAA receptor subtypes. Ligands 52-61 all exhibited moderate to high affinity at GABAA receptors containing alpha1 subunits. These ligands will be useful in further defining the pharmacophore at alpha1 beta3 gamma2 receptors.
Numerous transcription factors have been identified which have profound effects on developing neurons. A fundamental problem is to identify genes downstream of these factors and order them in developmental pathways. We have previously identified 85 genes with changed expression in the trigeminal ganglia of mice lacking Brn3a, a transcription factor encoded by the Pou4f1 gene. Here we use locus-wide chromatin immunoprecipitation in embryonic trigeminal neurons to show that Brn3a is a direct repressor of two of these downstream genes, NeuroD1 and NeuroD4, and also directly modulates its own expression. Comparison of Brn3a binding to the Pou4f1 locus in vitro and in vivo reveals that not all high affinity sites are occupied, and several Brn3a binding sites identified in the promoters of genes that are silent in sensory ganglia are also not occupied in vivo. Site occupancy by Brn3a can be correlated with evolutionary conservation of the genomic regions containing the recognition sites and also with histone modifications found in regions of chromatin active in transcription and gene regulation, suggesting that Brn3a binding is highly context dependent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.