Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and URLs. Birdsuite, http://www.broad.mit.edu/mpg/birdsuite/; PLINK CNV analysis tools, http://pngu.mgh.harvard.edu/purcell/plink/cnv; 1000 Genomes, http://www.1000genomes.org. COMPETING INTERESTS STATEMENTThe authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturegenetics/. BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype. NIH Public AccessStudies of SNPs and CNVs in human disease have to date been built on different analytical approaches, and somewhat based on conflicting assumptions. Specifically, SNP genotyping methods 1,2 assume that every individual has two copies of each locus, whereas studies of copy number variation assume that individuals vary in their copy number across the genome. Because SNPs and CNVs coexist throughout the genome, they influence one another's measurement, and may act both separately and in concert to influence human phenotypes. Ignoring CNVs during SNP genotyping results in failure to capture the true underlying sequence at many sites (genotypes like AAB and A), and can create the appearance of violations of mendelian inheritance or Hardy-Weinberg equilibrium where none in fact exists 3,4 . Ignoring SNPs in copy number analysis fails to incorporate allele-specific gains and losses, as well as the potential to exploit linkage disequilibrium between CNVs and nearby SNPs.In addition, methods for copy number analysis have not previously separated the ideas of genotyping known copy number polymorphisms (CNPs) from discovery of rare (and thus previously unobserved) copy number variants (CNVs) 5 . In the former case, as in SNP genotyping, existing information about known polymorphisms can be used to design arrays, train clustering algorithms and assign a prior probability of aberrant copy number to guide interpretation of measurements. Discovery of rare variants, as in sequence analysis for rare mutations, is a much more difficult problem-both because it is more difficult to detect a single event tha...
We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.