The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
African-Americans have higher rates of kidney disease than European-Americans. Here we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 [FSGS odds ratio = 10.5 (95% CI 6.0–18.4); H-ESKD odds ratio = 7.3 (95% CI 5.6–9.5)]. The two APOL1 variants are common in African chromosomes but absent from European chromosomes and both reside within haplotypes that harbor signatures of positive selection. ApoL1 is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.
The increased burden of chronic kidney and end-stage kidney diseases (ESKD) in populations of African ancestry has been largely unexplained. To identify genetic variants predisposing to idiopathic and HIV-1-associated focal segmental glomerulosclerosis (FSGS), we carried out an admixture-mapping linkage-disequilibrium genome scan on 190 African American individuals with FSGS and 222 controls. We identified a chromosome 22 region with a genome-wide logarithm of the odds (lod) score of 9.2 and a peak lod of 12.4 centered on MYH9, a functional candidate gene expressed in kidney podocytes. Multiple MYH9 SNPs and haplotypes were recessively associated with FSGS, most strongly a haplotype spanning exons 14 through 23 (OR = 5.0, 95% CI = 3.5-7.1; P = 4 × 10 −23 , n = 852). This association extended to hypertensive ESKD (OR = 2.2, 95% CI = 1.5-3.4; n = 433), but not type 2 diabetic ESKD (n = 476). Genetic variation at the MYH9 locus substantially explains the increased burden of FSGS and hypertensive ESKD among African Americans.The prevalence of chronic kidney disease (CKD) in the United States is currently estimated at 13% and is associated with significant morbidity and mortality 1 . Approximately 100,000 Americans develop end-stage kidney (renal) disease (ESKD) each year. The cumulative lifetime risk for ESKD varies by ancestry, and is approximately 7.5% for African Americans and 2.1% for European Americans2. African Americans have a disproportionate risk for several forms of CKD, among them diabetic nephropathy3, hypertensive nephrosclerosis4, lupus nephritis5, focal segmental glomerulosclerosis (FSGS) 6 and HIV-associated nephropathy (a distinct form of FSGS, also termed collap-sing glomerulopathy)7 , 8. The disproportionate risk for CKD may be partially explained by differences in social-economic status, lifestyle factors and clinical factors such as blood pressure control, but most of the increased risk remains unexplained9.FSGS is a clinical syndrome involving podocyte injury and glomerular scarring, and includes genetic forms with autosomal dominant or recessive mendelian inheritance, reactive forms associated with other illnesses (including HIV-1 disease) or medications, and a sporadic, idiopathic form, which accounts for the majority of cases 10 . Recent data suggest an increase in the incidence of FSGS, which currently accounts for up to 3% of ESKD cases6. African Americans have a fourfold increased risk for sporadic FSGS11 and an 18-to 50-fold increased risk for HIV-1-associated FSGS7 ,12 . Individuals of African descent also have increased risk for FSGS in other geographic regions, further suggesting that genetic factors contribute to these disparities 11 .A strategy for identifying genes underlying such ancestry-driven health disparities is mapping by admixture linkage disequilibrium (MALD). MALD has successfully identified a genomic region associated with prostate cancer 13 subsequently replicated by a genome-wide association study14, as well as genes associated with hypertension15, multiple scl...
Detecting recent selected ‘genomic footprints’ applies directly to the discovery of disease genes and in the imputation of the formative events that molded modern population genetic structure. The imprints of historic selection/adaptation episodes left in human and animal genomes allow one to interpret modern and ancestral gene origins and modifications. Current approaches to reveal selected regions applied in genome-wide selection scans (GWSSs) fall into eight principal categories: (I) phylogenetic footprinting, (II) detecting increased rates of functional mutations, (III) evaluating divergence versus polymorphism, (IV) detecting extended segments of linkage disequilibrium, (V) evaluating local reduction in genetic variation, (VI) detecting changes in the shape of the frequency distribution (spectrum) of genetic variation, (VII) assessing differentiating between populations (FST), and (VIII) detecting excess or decrease in admixture contribution from one population. Here, we review and compare these approaches using available human genome-wide datasets to provide independent verification (or not) of regions found by different methods and using different populations. The lessons learned from GWSSs will be applied to identify genome signatures of historic selective pressures on genes and gene regions in other species with emerging genome sequences. This would offer considerable potential for genome annotation in functional, developmental and evolutionary contexts.
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is in MXL, in CLM, and in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas thousand years ago (kya), supports that the MXL Ancestors split kya, with a subsequent split of the ancestors to CLM and PUR kya. The model also features effective populations of in Mexico, in Colombia, and in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.
BackgroundPatterns of genetic and genomic variance are informative in inferring population history for human, model species and endangered populations.ResultsHere the genome sequence of wild-born African cheetahs reveals extreme genomic depletion in SNV incidence, SNV density, SNVs of coding genes, MHC class I and II genes, and mitochondrial DNA SNVs. Cheetah genomes are on average 95 % homozygous compared to the genomes of the outbred domestic cat (24.08 % homozygous), Virunga Mountain Gorilla (78.12 %), inbred Abyssinian cat (62.63 %), Tasmanian devil, domestic dog and other mammalian species. Demographic estimators impute two ancestral population bottlenecks: one >100,000 years ago coincident with cheetah migrations out of the Americas and into Eurasia and Africa, and a second 11,084–12,589 years ago in Africa coincident with late Pleistocene large mammal extinctions. MHC class I gene loss and dramatic reduction in functional diversity of MHC genes would explain why cheetahs ablate skin graft rejection among unrelated individuals. Significant excess of non-synonymous mutations in AKAP4 (p<0.02), a gene mediating spermatozoon development, indicates cheetah fixation of five function-damaging amino acid variants distinct from AKAP4 homologues of other Felidae or mammals; AKAP4 dysfunction may cause the cheetah’s extremely high (>80 %) pleiomorphic sperm.ConclusionsThe study provides an unprecedented genomic perspective for the rare cheetah, with potential relevance to the species’ natural history, physiological adaptations and unique reproductive disposition.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0837-4) contains supplementary material, which is available to authorized users.
BackgroundAdaptive alleles may rise in frequency as a consequence of positive selection, creating a pattern of decreased variation in the neighboring loci, known as a selective sweep. When the region containing this pattern is compared to another population with no history of selection, a rise in variance of allele frequencies between populations is observed. One challenge presented by large genome-wide datasets is the ability to differentiate between patterns that are remnants of natural selection from those expected to arise at random and/or as a consequence of selectively neutral demographic forces acting in the population.FindingsSmileFinder is a simple program that looks for diversity and divergence patterns consistent with selection sweeps by evaluating allele frequencies in windows, including neighboring loci from two or more populations of a diploid species against the genome-wide neutral expectation. The program calculates the mean of heterozygosity and FST in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1 K to 100 M; the higher the number, the more precise the percentiles ascribed to the extreme observed values.ConclusionsThe output from SmileFinder can be used to plot percentile values to look for population diversity and divergence patterns that may suggest past actions of positive selection along chromosome maps, and to compare lists of suspected candidate genes under random gene sets to test for the overrepresentation of these patterns among gene categories. Both applications of the algorithm have already been used in published studies. Here we present a publicly available, open source program that will serve as a useful tool for preliminary scans of selection using worldwide databases of human genetic variation, as well as population datasets for many non-human species, from which such data is rapidly emerging with the advent of new genotyping and sequencing technologies.
Contemporary genetic variation among Latin Americans human groups reflects population migrations shaped by complex historical, social and economic factors. Consequently, admixture patterns may vary by geographic regions ranging from countries to neighborhoods. We examined the geographic variation of admixture across the island of Puerto Rico and the degree to which it could be explained by historic and social events. We analyzed a census-based sample of 642 Puerto Rican individuals that were genotyped for 93 ancestry informative markers (AIMs) to estimate African, European and Native American ancestry. Socioeconomic status (SES) data and geographic location were obtained for each individual. There was significant geographic variation of ancestry across the island. In particular, African ancestry demonstrated a decreasing East to West gradient that was partially explained by historical factors linked to the colonial sugar plantation system. SES also demonstrated a parallel decreasing cline from East to West. However, at a local level, SES and African ancestry were negatively correlated. European ancestry was strongly negatively correlated with African ancestry and therefore showed patterns complementary to African ancestry. By contrast, Native American ancestry showed little variation across the island and across individuals and appears to have played little social role historically. The observed geographic distributions of SES and genetic variation relate to historical social events and mating patterns, and have substantial implications for the design of studies in the recently admixed Puerto Rican population. More generally, our results demonstrate the importance of incorporating social and geographic data with genetics when studying contemporary admixed populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.