Yīmíng Bào scite author profile

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

show abstract

The Influenza Virus Resource at the National Center for Biotechnology Information

Bào

Bolotov

Dernovoy

et al. 2008

J Virol

889

790

View full text Add to dashboard Cite

NCBI Viral Genomes Resource

et al. 2014

View full text Add to dashboard Cite

Recent technological innovations have ignited an explosion in virus genome sequencing that promises to fundamentally alter our understanding of viral biology and profoundly impact public health policy. Yet, any potential benefits from the billowing cloud of next generation sequence data hinge upon well implemented reference resources that facilitate the identification of sequences, aid in the assembly of sequence reads and provide reference annotation sources. The NCBI Viral Genomes Resource is a reference resource designed to bring order to this sequence shockwave and improve usability of viral sequence data. The resource can be accessed at http://www.ncbi.nlm.nih.gov/genome/viruses/ and catalogs all publicly available virus genome sequences and curates reference genome sequences. As the number of genome sequences has grown, so too have the difficulties in annotating and maintaining reference sequences. The rapid expansion of the viral sequence universe has forced a recalibration of the data model to better provide extant sequence representation and enhanced reference sequence products to serve the needs of the various viral communities. This, in turn, has placed increased emphasis on leveraging the knowledge of individual scientific communities to identify important viral sequences and develop well annotated reference virus genome sets.

show abstract

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses

et al. 2005

View full text Add to dashboard Cite

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States, and observed multiple co-circulating clades with different population frequencies. Strikingly, phylogenies inferred for individual gene segments revealed that multiple reassortment events had occurred among these clades, such that one clade of H3N2 viruses present at least since 2000 had provided the hemagglutinin gene for all those H3N2 viruses sampled after the 2002–2003 influenza season. This reassortment event was the likely progenitor of the antigenically variant influenza strains that caused the A/Fujian/411/2002-like epidemic of the 2003–2004 influenza season. However, despite sharing the same hemagglutinin, these phylogenetically distinct lineages of viruses continue to co-circulate in the same population. These data, derived from the first large-scale analysis of H3N2 viruses, convincingly demonstrate that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.

show abstract

Virus Variation Resource – improved response to emergent viral outbreaks

et al. 2016

View full text Add to dashboard Cite

The Virus Variation Resource is a value-added viral sequence data resource hosted by the National Center for Biotechnology Information. The resource is located at http://www.ncbi.nlm.nih.gov/genome/viruses/variation/ and includes modules for seven viral groups: influenza virus, Dengue virus, West Nile virus, Ebolavirus, MERS coronavirus, Rotavirus A and Zika virus. Each module is supported by pipelines that scan newly released GenBank records, annotate genes and proteins and parse sample descriptors and then map them to controlled vocabulary. These processes in turn support a purpose-built search interface where users can select sequences based on standardized gene, protein and metadata terms. Once sequences are selected, a suite of tools for downloading data, multi-sequence alignment and tree building supports a variety of user directed activities. This manuscript describes a series of features and functionalities recently added to the Virus Variation Resource.

show abstract

Taxonomy of the order Mononegavirales: update 2016

et al. 2016

View full text Add to dashboard Cite

In 2016, the order Mononegavirales was emended through the addition of two new families (Mymonaviridae and Sunviridae), the elevation of the paramyxoviral subfamily Pneumovirinae to family status (Pneumoviridae), the addition of five free-floating genera (Anphevirus, Arlivirus, Chengtivirus, Crustavirus, and Wastrivirus), and several other changes at the genus and species levels. This article presents the updated taxonomy of the order Mononegavirales as now accepted by the International Committee on Taxonomy of Viruses (ICTV).

show abstract

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution

Ghedin¹,

Sengamalay²,

Shumway³

et al. 2005

Nature

410

300

View full text Add to dashboard Cite

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types

Chen

Zhang

et al. 2021

Genomics, Proteomics & Bioinformatics

564

191

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yīmíng Bào

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

The Influenza Virus Resource at the National Center for Biotechnology Information

NCBI Viral Genomes Resource

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses

Virus Variation Resource – improved response to emergent viral outbreaks

Taxonomy of the order Mononegavirales: update 2016

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types

Contact Info

Product

Resources

About