The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection.
The Uyghur people residing in Xinjiang, a territory located in the far west of China and crossed by the Silk Road, are a key ethnic group for understanding the history of human dispersion in Eurasia. Here we assessed the genetic structure and ancestry of 951 Xinjiang's Uyghurs (XJU) representing 14 geographical subpopulations. We observed a southwest and northeast differentiation within XJU, which was likely shaped jointly by the Tianshan Mountains, which traverses from east to west as a natural barrier, and gene flow from both east and west directions. In XJU, we identified four major ancestral components that were potentially derived from two earlier admixed groups: one from the West, harboring European (25-37%) and South Asian ancestries (12-20%), and the other from the East, with Siberian (15-17%) and East Asian (29-47%) ancestries. By using a newly developed method, MultiWaver, the complex admixture history of XJU was modeled as a two-wave admixture. An ancient wave was dated back to ∼3,750 years ago (ya), which is much earlier than that estimated by previous studies, but fits within the range of dating of mummies that exhibited European features that were discovered in the Tarim basin, which is situated in southern Xinjiang (4,000-2,000 ya); a more recent wave occurred around 750 ya, which is in agreement with the estimate from a recent study using other methods. We unveiled a more complex scenario of ancestral origins and admixture history in XJU than previously reported, which further suggests Bronze Age massive migrations in Eurasia and East-West contacts across the Silk Road.
BackgroundThe genetic relationships reported by recent studies between Sherpas and Tibetans are controversial. To gain insights into the population history and the genetic basis of high-altitude adaptation of the two groups, we analyzed genome-wide data in 111 Sherpas (Tibet and Nepal) and 177 Tibetans (Tibet and Qinghai), together with available data from present-day human populations.ResultsSherpas and Tibetans show considerable genetic differences and can be distinguished as two distinct groups, even though the divergence between them (~3200–11,300 years ago) is much later than that between Han Chinese and either of the two groups (~6200–16,000 years ago). Sub-population structures exist in both Sherpas and Tibetans, corresponding to geographical or linguistic groups. Differentiation of genetic variants between Sherpas and Tibetans associated with adaptation to either high-altitude or ultraviolet radiation were identified and validated by genotyping additional Sherpa and Tibetan samples.ConclusionsOur analyses indicate that both Sherpas and Tibetans are admixed populations, but the findings do not support the previous hypothesis that Tibetans derive their ancestry from Sherpas and Han Chinese. Compared to Tibetans, Sherpas show higher levels of South Asian ancestry, while Tibetans show higher levels of East Asian and Central Asian/Siberian ancestry. We propose a new model to elucidate the differentiated demographic histories and local adaptations of Sherpas and Tibetans.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1242-y) contains supplementary material, which is available to authorized users.
The length of ancestral tracks decays with the passing of generations which can be used to infer population admixture histories. Previous studies have shown the power in recovering the histories of admixed populations via the length distributions of ancestral tracks even under simple models. We believe that the deduction of length distributions under a general model will greatly elevate the power. Here we first deduced the length distributions under a general model and proposed general principles in parameter estimation and model selection with the deduced length distributions. Next, we focused on studying the length distributions and its applications under three typical special cases. Extensive simulations showed that the length distributions of ancestral tracks were well predicted by our theoretical framework. We further developed a new method, AdmixInfer, based on the length distributions and good performance was observed when it was applied to infer population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, good performance was also observed when applied to some real datasets of African Americans, Mexicans and South Asian populations from the HapMap project and the Human Genome Diversity Project.
Human genetic adaptation to high altitudes (>2500 m) has been extensively studied over the last few years, but few functional adaptive genetic variants have been identified, largely owing to the lack of deep-genome sequencing data available to previous studies. Here, we build a list of putative adaptive variants, including 63 missense, 7 loss-of-function, 1,298 evolutionarily conserved variants and 509 expression quantitative traits loci. Notably, the top signal of selection is located in TMEM247, a transmembrane protein-coding gene. The Tibetan version of TMEM247 harbors one high-frequency (76.3%) missense variant, rs116983452 (c.248C > T; p.Ala83Val), with the T allele derived from archaic ancestry and carried by >94% of Tibetans but absent or in low frequencies (<3%) in non-Tibetan populations. The rs116983452-T is strongly and positively correlated with altitude and significantly associated with reduced hemoglobin concentration (p = 5.78 × 10−5), red blood cell count (p = 5.72 × 10−7) and hematocrit (p = 2.57 × 10−6). In particular, TMEM247-rs116983452 shows greater effect size and better predicts the phenotypic outcome than any EPAS1 variants in association with adaptive traits in Tibetans. Modeling the interaction between TMEM247-rs116983452 and EPAS1 variants indicates weak but statistically significant epistatic effects. Our results support that multiple variants may jointly deliver the fitness of the Tibetans on the plateau, where a complex model is needed to elucidate the adaptive evolution mechanism.
As the largest ethnic group in the world, the Han Chinese population is nonetheless underrepresented in global efforts to catalogue the genomic variability of natural populations. Here, we developed the PGG.Han, a population genome database to serve as the central repository for the genomic data of the Han Chinese Genome Initiative (Phase I). In its current version, the PGG.Han archives whole-genome sequences or high-density genome-wide single-nucleotide variants (SNVs) of 114 783 Han Chinese individuals (a.k.a. the Han100K), representing geographical sub-populations covering 33 of the 34 administrative divisions of China, as well as Singapore. The PGG.Han provides: (i) an interactive interface for visualization of the fine-scale genetic structure of the Han Chinese population; (ii) genome-wide allele frequencies of hierarchical sub-populations; (iii) ancestry inference for individual samples and controlling population stratification based on nested ancestry informative markers (AIMs) panels; (iv) population-structure-aware shared control data for genotype-phenotype association studies (e.g. GWASs) and (v) a Han-Chinese-specific reference panel for genotype imputation. Computational tools are implemented into the PGG.Han, and an online user-friendly interface is provided for data analysis and results visualization. The PGG.Han database is freely accessible via http://www.pgghan.org or https://www.hanchinesegenomes.org.
Our goal in developing the MultiWaver software series was to be able to infer population admixture history under various complex scenarios. The earlier version of MultiWaver considered only discrete admixture models. Here, we report a newly developed version, MultiWaver 2.0, that implements a more flexible framework and is capable of inferring multiple-wave admixture histories under both discrete and continuous admixture models. MultiWaver 2.0 can automatically select an optimal admixture model based on the length distribution of ancestral tracks of chromosomes, and the program can estimate the corresponding parameters under the selected model. Specifically, for discrete admixture models, we used a likelihood ratio test (LRT) to determine the optimal discrete model and an expectation-maximization algorithm to estimate the parameters. In addition, according to the principles of the Bayesian Information Criterion (BIC), we compared the optimal discrete model with several continuous admixture models. In MultiWaver 2.0, we also applied a bootstrapping technique to provide levels of support for the chosen model and the confidence interval (CI) of the estimations of admixture time. Simulation studies validated the reliability and effectiveness of our method. Finally, the program performed well when applied to real datasets of typical admixed populations, such as African Americans, Uyghurs, and Hazaras.
Zinc transporters play important roles in all eukaryotes by maintaining the rational zinc concentration in cells. However, the diversity of zinc transporter genes (ZTGs) remains poorly studied. Here, we investigated the genetic diversity of 24 human ZTGs based on the 1000 Genomes data. Some ZTGs show small population differences, such as SLC30A6 with a weighted-average FST (WA-FST = 0.015), while other ZTGs exhibit considerably large population differences, such as SLC30A9 (WA-FST = 0.284). Overall, ZTGs harbor many more highly population-differentiated variants compared with random genes. Intriguingly, we found that SLC30A9 was underlying natural selection in both East Asians (EAS) and Africans (AFR) but in different directions. Notably, a non-synonymous variant (rs1047626) in SLC30A9 is almost fixed with 96.4% A in EAS and 92% G in AFR, respectively. Consequently, there are two different functional haplotypes exhibiting dominant abundance in AFR and EAS, respectively. Furthermore, a strong correlation was observed between the haplotype frequencies of SLC30A9 and distributions of zinc contents in soils or crops. We speculate that the genetic differentiation of ZTGs could directly contribute to population heterogeneity in zinc transporting capabilities and local adaptations of human populations in regard to the local zinc state or diets, which have both evolutionary and medical implications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.