The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.
The primary structures of 11 proteins of ADP-glucose pyrophosphorylase are aligned and compared for relationships among them. These comparisons indicate that many domains are retained in the proteins from both the enteric bacteria and the proteins from angiosperm plants. The proteins from angiosperm plants show two main groups, with one of the main groups demonstrating two subgroups. The two main groups of angiosperm plant proteins are based upon the two subunits of the enzyme, whereas the subgroups of the large subunit group are based upon the tissue in which the particular gene had been expressed. Additionally, the small subunit group shows a slight but distinct division into a grouping based upon whether the protein is from a monocot or dicot source. Previous structure-function studies with the Escherichia coli enzyme have identified regions of the primary structure associated with the substrate binding site, the allosteric activator binding site, and the allosteric inhibitor binding site. There is conservation of the primary structure of the polypeptides for the substrate binding site and the allosteric activator binding site. The nucleotide sequences of the coding regions of the genes of 11 of these proteins are compared for relationships among them. This analysis indicates that the protein for the small subunit has been subject to greater selective pressure to retain a particular primary structure. Also, the coding region of the precursor gene for the small subunit diverged from the coding region of the precursor gene for the large subunits slightly prior to the divergence of the two coding regions of the genes for the two tissue-specific large subunit genes.
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.
Near-full-length cDNA clones to the small and large subunit of the heterotetrameric potato tuber ADP-glucose pyrophosphorylase have been isolated and characterized. The missing amino terminal sequence of the small subunit has also been elucidated from its corresponding genomic clone. Primary sequence comparisons revealed that each potato subunit had less identity to each other than to their homologous subunit from other plants. It also appeared that the smaller subunit is more conserved among the different plants and the larger subunit more divergent. Amino acid comparisons of both potato tuber sequences to the Escherichia coli ADP-glucose pyrophosphorylase sequence revealed conserved regions important for both catalytic and allosteric function of the bacterial enzyme.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.