The proportion of human genetic variation due to differences between populations is modest, and individuals from different populations can be genetically more similar than individuals from the same population. Yet sufficient genetic data can permit accurate classification of individuals into populations. Both findings can be obtained from the same data set, using the same number of polymorphic loci. This article explains why. Our analysis focuses on the frequency, v, with which a pair of random individuals from two different populations is genetically more similar than a pair of individuals randomly selected from any single population. We compare v to the error rates of several classification methods, using data sets that vary in number of loci, average allele frequency, populations sampled, and polymorphism ascertainment strategy. We demonstrate that classification methods achieve higher discriminatory power than v because of their use of aggregate properties of populations. The number of loci analyzed is the most critical variable: with 100 polymorphisms, accurate classification is possible, but v remains sizable, even when using populations as distinct as sub-Saharan Africans and Europeans. Phenotypes controlled by a dozen or fewer loci can therefore be expected to show substantial overlap between human populations. This provides empirical justification for caution when using population labels in biomedical settings, with broad implications for personalized medicine, pharmacogenetics, and the meaning of race.
Background/Aims: The L1 retrotransposable element family is the most successful self-replicating genomic parasite of the human genome. L1 elements drive replication of Alu elements, and both have had far-reaching impacts on the human genome. We use L1 and Alu insertion polymorphisms to analyze human population structure. Methods: We genotyped 75 recent, polymorphic L1 insertions in 317 individuals from 21 populations in sub-Saharan Africa, East Asia, Europe and the Indian subcontinent. This is the first sample of L1 loci large enough to support detailed population genetic inference. We analyzed these data in parallel with a set of 100 polymorphic Alu insertion loci previously genotyped in the same individuals. Results and Conclusion: The data sets yield congruent results that support the recent African origin model of human ancestry. A genetic clustering algorithm detects clusters of individuals corresponding to continental regions. The number of loci sampled is critical: with fewer than 50 typical loci, structure cannot be reliably discerned in these populations. The inclusion of geographically intermediate populations (from India) reduces the distinctness of clustering. Our results indicate that human genetic variation is neither perfectly correlated with geographic distance (purely clinal) nor independent of distance (purely clustered), but a combination of both: stepped clinal.
Families with early-onset Alzheimer’s disease (AD) sharing a single PSEN2 mutation exhibit a wide range of age-at-onset, suggesting that modifier loci segregate within these families. While APOE is known to be an age-at-onset modifier, it does not explain all of this variation. We performed a genome scan within nine such families for loci influencing age-at-onset, while simultaneously controlling for variation in the primary PSEN2 mutation (N141I) and APOE. We found significant evidence of linkage between age-at-onset and chromosome 1q23.3 (P < 0.001) when analysis included all families, and to chromosomes 1q23.3 (P < 0.001), 17p13.2 (P = 0.0002), 7q33 (P = 0.017), and 11p14.2 (P = 0.017) in a single large pedigree. Simultaneous analysis of these four chromosomes maintained strong evidence of linkage to chromosomes 1q23.3 and 17p13.2 when all families were analyzed, and to chromosomes 1q23.3, 7q33, and 17p13.2 within the same single pedigree. Inclusion of major gene covariates proved essential to detect these linkage signals, as all linkage signals dissipated when PSEN2 and APOE were excluded from the model. The four chromosomal regions with evidence of linkage all coincide with previous linkage signals, associated SNPs, and/or candidate genes identified in independent AD study populations. This study establishes several candidate regions for further analysis and is consistent with an oligogenic model of AD risk and age-at-onset. More generally, this study also demonstrates the value of searching for modifier loci in existing datasets previously used to identify primary causal variants for complex disease traits.
Alzheimer’s disease (AD) is a common neurodegenerative disorder of late life with a complex genetic basis. Although several genes are known to play a role in rare early-onset AD, only the APOE gene is known to have a high contribution to risk of the common late-onset form of the disease (LOAD, onset > 60 years). APOE genotypes vary in their AD risk as well as age-at-onset distributions, and it is likely that other loci will similarly affect AD age-at-onset. Here we present the first analysis of age-at-onset in the NIMH LOAD sample that allows for both a multilocus trait model and genetic heterogeneity among the contributing sites, while at the same time accommodating age censoring, effects of known genetic covariates, and full pedigree and marker information. The results provide evidence for genomic regions not previously implicated in this data set, including regions on chromosomes 7q, 15, and 19p. They also affirm evidence for loci on chromosomes 1q, 6p, 9q, 11, and, of course, the APOE locus on 19q, all of which have been reported previously in the same sample. The analyses failed to find evidence for linkage to chromosome 10 with inclusion of unaffected subjects and extended pedigrees. Several regions implicated in these analyses in the NIMH sample have been previously reported in genome scans of other AD samples. These results, therefore, provide independent confirmation of AD loci in family-based samples on chromosomes 1q, 7q, 19p, and suggest that further efforts towards identifying the underlying causal loci are warranted.
We explored the utility of population- and pedigree-based analyses using the Framingham Heart Study genome-wide 50 k single-nucleotide polymorphism marker data provided for Genetic Analysis Workshop 16. Our aims were: 1) to compare identity-by-descent sharing estimates from variable amounts of data; 2) to apply each of these estimates to a case-control association study designed to control for relatedness among samples; and 3) to contrast these results to those obtained using model-based and model-free linkage analysis methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.