Ancestry informative markers (AIMs) are genetic loci showing alleles with large frequency differences between populations. AIMs can be used to estimate biogeographical ancestry at the level of the population, subgroup (e.g. cases and controls) and individual. Ancestry estimates at both the subgroup and individual level can be directly instructive regarding the genetics of the pheno-types that differ qualitatively or in frequency between populations. These estimates can provide a compelling foundation for the use of admixture mapping (AM) methods to identify the genes underlying these traits. We present details of a panel of 34 AIMs and demonstrate how such studies can proceed, by using skin pigmentation as a model phenotype. We have genotyped these markers in two population samples with primarily African ancestry, viz. African Americans from Washington D.C. and an African Caribbean sample from Britain, and in a sample of European Americans from Pennsylvania. In the two African population samples, we observed significant correlations between estimates of individual ancestry and skin pigmentation as measured by reflectometry (R 2 =0.21, P<0.0001 for the African-American sample and R 2 =0.16, P<0.0001 for the British African-Caribbean sample). These correlations confirm the validity of the ancestry estimates and also indicate the high level of population structure related to admixture, a level that characterizes these populations and that is detectable by using other tests to identify genetic structure. We have also applied two methods of admixture mapping to test for the effects of three candidate genes (TYR, OCA2, MC1R) on pigmentation. We show that TYR and OCA2 have measurable effects on skin pigmentation differences between the west African and west European parental populations. This work indicates that it is possible to estimate the individual ancestry of a person based on DNA analysis with a reasonable number of welldefined genetic markers. The implications and applications of ancestry estimates in biomedical research are discussed.
Gene flow between genetically distinct populations creates linkage disequilibrium (admixture linkage disequilibrium [ALD]) among all loci (linked and unlinked) that have different allele frequencies in the founding populations. We have explored the distribution of ALD by using computer simulation of two extreme models of admixture: the hybrid-isolation (HI) model, in which admixture occurs in a single generation, and the continuous-gene-flow (CGF) model, in which admixture occurs at a steady rate in every generation. Linkage disequilibrium patterns in African American population samples from Jackson, MS, and from coastal South Carolina resemble patterns observed in the simulated CGF populations, in two respects. First, significant association between two loci (FY and AT3) separated by 22 cM was detected in both samples. The retention of ALD over relatively large (>10 cM) chromosomal segments is characteristic of a CGF pattern of admixture but not of an HI pattern. Second, significant associations were also detected between many pairs of unlinked loci, as observed in the CGF simulation results but not in the simulated HI populations. Such a high rate of association between unlinked markers in these populations could result in false-positive linkage signals in an admixture-mapping study. However, we demonstrate that by conditioning on parental admixture, we can distinguish between true linkage and association resulting from shared ancestry. Therefore, populations with a CGF history of admixture not only are appropriate for admixture mapping but also have greater power for detection of linkage disequilibrium over large chromosomal regions than do populations that have experienced a pattern of admixture more similar to the HI model, if methods are employed that detect and adjust for disequilibrium caused by continuous admixture.
We analyzed admixture in samples of six different African-American populations from South Carolina: Gullah-speaking Sea Islanders in coastal South Carolina, residents of four counties in the "Low Country" (Berkeley, Charleston, Colleton, and Dorchester), and persons living in the city of Columbia, located in central South Carolina. We used a battery of highly informative autosomal, mtDNA, and Y-chromosome markers. Two of the autosomal markers (FY and AT3) are linked and lie 22 cM apart on chromosome 1. The results of this study indicate, in accordance with previous historical, cultural, and anthropological evidence, a very low level of European admixture in the Gullah Sea Islanders (m = 3.5 +/- 0.9%). The proportion of European admixture is higher in the Low Country (m ranging between 9. 9 +/- 1.8% and 14.0 +/- 1.9%), and is highest in Columbia (m = 17.7 +/- 3.1%). A sex-biased European gene flow and a small Native American contribution to the African-American gene pool are also evident in these data. We studied the pattern of pairwise allelic associations between the FY locus and the nine other autosomal markers in our samples. In the combined sample from the Low Country (N = 548), a high level of linkage disequilibrium was observed between the linked markers, FY and AT3. Additionally, significant associations were also detected between FY and 4 of the 8 unlinked markers, suggesting the existence of significant genetic structure in this population. A continuous gene flow model of admixture could explain the observed pattern of genetic structure. A test conditioning on the overall admixture of each individual showed association of ancestry between the two linked markers (FY and AT3), but not between any of the unlinked markers, as theory predicts. Thus, even in the presence of genetic structure due to continuous gene flow or some other factor, it is possible to differentiate associations due to linkage from spurious associations due to genetic structure.
SummaryHispanic populations are a valuable resource that can and should facilitate the identification of complex trait genes by means of admixture mapping (AM). In this paper we focus on a particular Hispanic population living in the San Luis Valley (SLV) in Southern Colorado.We used a set of 22 Ancestry Informative Markers (AIMs) to describe the admixture process and dynamics in this population. AIMs are defined as genetic markers that exhibit allele frequency differences between parental populations ≥ 30%, and are more informative for studying admixed populations than random markers. The ancestral proportions of the SLV Hispanic population are estimated as 62.7 ± 2.1% European, 34.1 ± 1.9% Native American and 3.2 ± 1.5% West African. We also estimated the ancestral proportions of individuals using these AIMs. Population structure was demonstrated by the excess association of unlinked markers, the correlation between estimates of admixture based on unlinked marker sets, and by a highly significant correlation between individual Native American ancestry and skin pigmentation (R 2 = 0.082, p < 0.001). We discuss the implications of these findings in disease gene mapping efforts.
It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (delta) has been used as the criterion to select informative markers. However, it is unclear how to use delta for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using delta as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although delta is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical deltas. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.