Guangya Duan scite author profile

The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multi-omics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.

show abstract

The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR

Song¹,

Ma²,

Zou³

et al. 2020

View full text Add to dashboard Cite

On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, haplotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.

show abstract

Database Resources of the National Genomics Data Center in 2020

Li¹,

Yuan²,

Zhang³

et al. 2019

128

View full text Add to dashboard Cite

The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.

show abstract

The global landscape of SARS-CoV-2 genomes, variants, and haplotypes in 2019nCoVR

Song

Zou

et al. 2020

Preprint

View full text Add to dashboard Cite

On 22 January 2020, the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), created the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access SARS-CoV-2 information resource. 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by our in-house automated pipeline. Of particular note, 2019nCoVR performs systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. It also generates visualization of the spatiotemporal change for each variant and yields historical viral haplotype network maps for the course of the outbreak from all complete and high-quality genomes. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on COVID-19 (Coronavirus Disease 2019), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB-NGDC, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with National Center for Biotechnology Information. Collectively, all SARS-CoV-2 genome sequences, variants, haplotypes and literature are updated daily to provide timely information, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.

show abstract

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023

Wang¹,

Yang²,

Zhuang³

et al. 2022

107

View full text Add to dashboard Cite

The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global academic and industrial communities. With the explosive accumulation of multi-omics data generated at an unprecedented rate, CNCB-NGDC constantly expands and updates core database resources by big data archive, integrative analysis and value-added curation. In the past year, efforts have been devoted to integrating multiple omics data, synthesizing the growing knowledge, developing new resources and upgrading a set of major resources. Particularly, several database resources are newly developed for infectious diseases and microbiology (MPoxVR, KGCoV, ProPan), cancer-trait association (ASCancer Atlas, TWAS Atlas, Brain Catalog, CCAS) as well as tropical plants (TCOD). Importantly, given the global health threat caused by monkeypox virus and SARS-CoV-2, CNCB-NGDC has newly constructed the monkeypox virus resource, along with frequent updates of SARS-CoV-2 genome sequences, variants as well as haplotypes. All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.

show abstract

Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

Yang

Tang

et al. 2020

Preprint

View full text Add to dashboard Cite

COVID-19 has widely spread across the world, and much research is being conducted on the causative virus SARS-CoV-2. To help control the infection, we developed the Coronavirus GenBrowser (CGB) to monitor the pandemic. CGB allows visualization and analysis of the latest viral genomic data. Distributed genome alignments and an evolutionary tree built on the existing subtree are implemented for easy and frequent updates. The tree-based data are compressed at a ratio of 2,760:1, enabling fast access and analysis of SARS-CoV-2 variants. CGB can effectively detect adaptive evolution of specific alleles, such as D614G of the spike protein, in their early stage of spreading. By lineage tracing, the most recent common ancestor, dated in early March 2020, of nine strains collected from six different regions in three continents was found to cause the outbreak in Xinfadi, Beijing, China in June 2020. CGB also revealed that the first COVID-19 outbreak in Washington State was caused by multiple introductions of SARS-CoV-2. To encourage data sharing, CGB credits the person who first discovers any SARS-CoV-2 variant. As CGB is developed with eight different languages, it allows the general public in many regions of the world to easily access pre-analyzed results of more than 132,000 SARS-CoV-2 genomes. CGB is an efficient platform to monitor adaptive evolution and transmission of SARS-CoV-2.

show abstract

Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

Yang

Tang

et al. 2022

View full text Add to dashboard Cite

Genomic epidemiology is important to study the COVID-19 pandemic, and more than two million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequences were deposited into public databases. However, the exponential increase of sequences invokes unprecedented bioinformatic challenges. Here, we present the Coronavirus GenBrowser (CGB) based on a highly efficient analysis framework and a node-picking rendering strategy. In total, 1,002,739 high-quality genomic sequences with the transmission-related metadata were analyzed and visualized. The size of the core data file is only 12.20 MB, highly efficient for clean data sharing. Quick visualization modules and rich interactive operations are provided to explore the annotated SARS-CoV-2 evolutionary tree. CGB binary nomenclature is proposed to name each internal lineage. The pre-analyzed data can be filtered out according to the user-defined criteria to explore the transmission of SARS-CoV-2. Different evolutionary analyses can also be easily performed, such as the detection of accelerated evolution and ongoing positive selection. Moreover, the 75 genomic spots conserved in SARS-CoV-2 but non-conserved in other coronaviruses were identified, which may indicate the functional elements specifically important for SARS-CoV-2. The CGB was written in Java and JavaScript. It not only enables users who have no programming skills to analyze millions of genomic sequences, but also offers a panoramic vision of the transmission and evolution of SARS-CoV-2.

show abstract

HGD: an integrated homologous gene database across multiple species

Duan

Chen

et al. 2022

View full text Add to dashboard Cite

Homology is fundamental to infer genes’ evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guangya Duan

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021

The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR

Database Resources of the National Genomics Data Center in 2020

The global landscape of SARS-CoV-2 genomes, variants, and haplotypes in 2019nCoVR

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023

Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

HGD: an integrated homologous gene database across multiple species

Contact Info

Product

Resources

About