The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy
NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10 000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30 000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach.
The Gcn4p activation domain contains seven clusters of hydrophobic residues that make additive contributions to transcriptional activation in vivo. We observed efficient binding of a glutathione S-transferase (GST) Transcription initiation by RNA polymerase II (Pol II) requires assembly of a large complex consisting of Pol II and general transcription factors (GTFs) at the promoter. It has been proposed that assembly of this complex begins when TFIID, consisting of TATA box-binding protein (TBP) and its associated factors (TAF II proteins), binds to the core promoter, followed by sequential binding of other GTFs and Pol II itself (9). In another scenario, Pol II, certain GTFs, and coactivator proteins bind to the promoter as a preformed holoenzyme complex (46). Transcriptional activators bind to the promoter, generally upstream of the TATA element, and stimulate the assembly or function of the transcription initiation complex. Binding of TFIID to the core promoter appears to be rate limiting for initiation (12,43,88), and certain activators stimulate this step in initiation complex formation (3,11,21,39,40,50,91). Several activators bind TBP in vitro in a manner that depends on amino acids in the activation domain that are critical for transcriptional activation in vivo (7,11,26,35,38,51,(61)(62)(63), suggesting that direct interactions between the activator and TBP are involved in recruiting TFIID to the core promoter. Certain activation domains also bind TFIIB in vitro in a sequence-specific manner (4,7,14,41,56,91) and may stimulate recruitment of this GTF to the initiation complex (15,41,55,56).-Other studies suggest that activator function is mediated by one or more of the TAF II coactivator proteins associated with TBP in TFIID. Different activators may require specific TAF II proteins for activation (13,(74)(75)(76), and indeed, certain activation domains bind preferentially to specific TAF II proteins in vitro (24,37,57,83). The interactions between activators and TAF II proteins may serve primarily to recruit TFIID to the promoter (75). The human TAF II 250 subunit (and its Saccharomyces cerevisiae homolog yTAF II 130) has histone acetyltransferase (HAT) activity that may also promote initiation complex formation by destabilizing a repressive nucleosome structure at the promoter (64). A yeast Pol II-TAF II complex was shown to be required for transcriptional activation of a Gcn4p-regulated promoter in vitro (44); however, recent studies indicate that yTAF II proteins are not essential for transcriptional activation in vivo by Gcn4p and by several other yeast activator proteins (65,85).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.