The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
The timing and nature of the arrival and the subsequent expansion of modern humans into eastern Asia remains controversial. Using Y-chromosome biallelic markers, we investigated the ancient human-migration patterns in eastern Asia. Our data indicate that southern populations in eastern Asia are much more polymorphic than northern populations, which have only a subset of the southern haplotypes. This pattern indicates that the first settlement of modern humans in eastern Asia occurred in mainland Southeast Asia during the last Ice Age, coinciding with the absence of human fossils in eastern Asia, 50,000-100,000 years ago. After the initial peopling, a great northward migration extended into northern China and Siberia.
Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations.
Despite the fact that the continuity of morphology of fossil specimens of modern humans found in China has repeatedly challenged the Out-of-Africa hypothesis, Chinese populations are underrepresented in genetic studies. Genetic profiles of 28 populations sampled in China supported the distinction between southern and northern populations, while the latter are biphyletic. Linguistic boundaries are often transgressed across language families studied, ref lecting substantial gene f low between populations. Nevertheless, genetic evidence does not support an independent origin of Homo sapiens in China. The phylogeny also suggested that it is more likely that ancestors of the populations currently residing in East Asia entered from Southeast Asia.
Short tandem repeats (STRs) are short tandemly repeated DNA sequences that involve a repetitive unit of 1–6 bp. Because of their polymorphisms and high mutation rates, STRs are widely used in biological research. Strand-slippage replication is the predominant mutation mechanism of STRs, and the stepwise mutation model is regarded as the main mutation model. STR mutation rates can be influenced by many factors. Moreover, some trinucleotide repeats are associated with human neurodegenerative diseases. In order to deepen our knowledge of these diseases and broaden STR application, it is essential to understand the STR mutation process in detail. In this review, we focus on the current known information about STR mutation.
Chemokine receptor CCR2 and stromal-derived factor (SDF-1) are involved in HIV infection and AIDS symptom onset. Recent cohort studies showed that point mutations in these two genes, CCR2-64I and SDF1-3'A, can delay AIDS onset > or = 16 years after seroconversions. The protective effect of CCR2-64I is dominant, whereas that of SDF1-3'A is recessive. SDF1-3'A homozygotes also showed possible protection against HIV-1 infection. In this study, we surveyed the frequency distributions of the two alleles at both loci in world populations, with emphasis on those in east Asia. The CCR2-64I frequencies do not vary significantly in the different continents, having a range of 0.1-0.2 in most populations. A decreasing cline of the CCR2-64I frequency from north to south was observed in east Asia. In contrast, the distribution of SDF1-3'A in world populations varies substantially, and the highest frequency was observed in Oceanian populations. Moreover, an increasing cline of the SDF1-3'A frequency from north to south was observed in east Asia. The relative hazard values were computed to evaluate the risk of AIDS onset on the basis of two-locus genotypes in the east Asian and world populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.