The International HapMap Project was designed to create a genome-wide database of patterns of human genetic variation, with the expectation that these patterns would be useful for genetic association studies of common diseases. This expectation has been amply fulfilled with just the initial output of genome-wide association studies, identifying nearly 100 loci for nearly 40 common diseases and traits. These associations provided new insights into pathophysiology, suggesting previously unsuspected etiologic pathways for common diseases that will be of use in identifying new therapeutic targets and developing targeted interventions based on genetically defined risk. In addition, HapMap-based discoveries have shed new light on the impact of evolutionary pressures on the human genome, suggesting multiple loci important for adapting to disease-causing pathogens and new environments. In this review we examine the origin, development, and current status of the HapMap; its prospects for continued evolution; and its current and potential future impact on biomedical science.Nonstandard abbreviations used: GWA, genome-wide association; LD, linkage disequilibrium; OR, odds ratio. approach would be very expensive and would not capture rarer variants or structural variants (such as insertions, deletions, and inversions) that are not identified by genotyping of SNPs. However, the pattern of association among SNPs in the genome suggests a potential shortcut, based on haplotypes and linkage disequilibrium (LD). A haplotype is the combined set of alleles at a number of closely spaced sites on a single chromosome. Nearby SNP alleles tend to be associated with each other, or inherited together more often than expected by chance, because most arise through mutational events that each occur once on an ancestral haplotype background and are inherited with that background, rather than arising multiple times de novo on different backgrounds (18). This is because for most SNPs the rate of mutation, or novel SNP generation, is relatively low (roughly 10 -8 per site per generation, or 30 new variants per haploid gamete), as are the rate of recombination occurring with each meiosis and the number of generations (roughly 10 4 ) between currently living individuals and their most recent common ancestor (3). Each new allele is initially associated with the other allelic variants present on the particular stretch of ancestral DNA on which it arose, and these associations are only slowly broken down over time by recombination between SNPs and generation of new variants ( Figure 2) (3). Two polymorphic sites are said to be in LD when their specific alleles are correlated in a population. High LD means that the SNP alleles are almost always inherited together; information about
GlossaryAllele, an alternative form of a gene or SNP, or another type of variant Biallelic, having only two possible alleles in a variant, typically a SNP Confidence interval, the range of values surrounding a point estimate, such as an OR, within which the true value is be...