The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K. Sirotkin (1999) Genome Res., 9, 677-679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp://ncbi.nlm.nih.gov/snp/.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
StraplineThe National Center for Biotechnology Information has created the dbGaP public repository for individual-level phenotype, exposure, genotype, and sequence data, and the associations between them. dbGaP assigns stable, unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations and groups of study subjects who have given similar consents for use of their data. IntroductionThe technical advances and declining costs for high-throughput genotyping afford investigators fresh opportunities to do increasingly complex analyses of genetic associations with phenotypic and disease characteristics. The leading candidates for such genome wide association studies (GWAS) are existing large-scale cohort and clinical studies that collected rich sets of phenotype data. To support investigator access to data from these initiatives at the National Institutes of Health (NIH) and elsewhere, the National Center for Biotechnology Information (NCBI) has created a database of Genotypes and Phenotypes (dbGaP) with stable identifiers that make it possible for published studies to discuss or cite the primary data in a specific and uniform way. dbGaP provides unprecedented access to the large-scale genetic and phenotypic datasets required for GWAS designs, including public access to study documents linked to summary data on specific phenotype variables, statistical overviews of the genetic information, position of published associations on the genome, and authorized access to individual-level data.The purposes of this description of dbGaP are three-fold: (1) to describe dbGaP's functionality for users and submitters; (2) to describe dbGaP's design and operational processes for database methodologists to emulate or improve upon; and (3) to reassure the lay and scientific public that individual-level phenotype and genotype data are securely and responsibly managed. dbGaP accommodates studies of varying design. It contains four basic types of data: (1) Study documentation, including study descriptions, protocol documents, and data collection instruments, such as questionnaires; (2) Phenotypic data for each variable assessed, at both an individual level and in summary form; (3) Genetic data, including study subjects' individual genotypes, pedigree information, fine mapping results, and resequencing traces; and (4) Statistical results, including association and linkage analyses, when available.Address editorial correspondence to: Stephen Sherry, PhD, National Center for Biotechnology Information, 8600 Rockville Pike, MSC 3804, Bethesda, MD 20894-3804, phone: 301-435-7799, fax: 301-480-5789, e-mail: sherry@ncbi.nlm To protect the confidentiality of study subjects, dbGaP accepts only de-identified data and requires investigators to go through an authorization process in order to access individual-level phenotype and genotype datasets. Summary phenotype and genotype data, as well as stu...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.