We report the Simons Genome Diversity Project (SGDP) dataset: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioral modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that in other non-Africans.
Congenital heart disease (CHD) is the leading cause of mortality from birth defects. Exome sequencing of a single cohort of 2,871 CHD probands including 2,645 parent-offspring trios implicated rare inherited mutations in 1.8%, including a recessive founder mutation in GDF1 accounting for ~5% of severe CHD in Ashkenazim, recessive genotypes in MYH6 accounting for ~11% of Shone complex, and dominant FLT4 mutations accounting for 2.3% of Tetralogy of Fallot. De novo mutations (DNMs) accounted for 8% of cases, including ~3% of isolated CHD patients and ~28% with both neurodevelopmental and extra-cardiac congenital anomalies. Seven genes surpassed thresholds for genome-wide significance and 12 genes not previously implicated in CHD had > 70% probability of being disease-related; DNMs in ~440 genes are inferred to contribute to CHD. There was striking overlap between genes with damaging DNMs in probands with CHD and autism.
The ETS gene family is frequently involved in chromosome translocations that cause human cancer, including prostate cancer, leukemia, and sarcoma. However, the mechanisms by which oncogenic ETS proteins, which are DNA-binding transcription factors, target genes necessary for tumorigenesis is not well understood. Ewing's sarcoma serves as a paradigm for the entire class of ETS-associated tumors because nearly all cases harbor recurrent chromosomal translocations involving ETS genes. The most common translocation in Ewing's sarcoma encodes the EWS/FLI oncogenic transcription factor. We used whole genome localization (ChIP-chip) to identify target genes that are directly bound by EWS/FLI. Analysis of the promoters of these genes demonstrated a significant over-representation of highly repetitive GGAA-containing elements (microsatellites). In a parallel approach, we found that EWS/FLI uses GGAA microsatellites to regulate the expression of some of its target genes including NR0B1, a gene required for Ewing's sarcoma oncogenesis. The microsatellite in the NR0B1 promoter bound EWS/FLI in vitro and in vivo and was both necessary and sufficient to confer EWS/FLI regulation to a reporter gene. Genome wide computational studies demonstrated that GGAA microsatellites were enriched close to EWS/FLI-up-regulated genes but not down-regulated genes. Mechanistic studies demonstrated that the ability of EWS/FLI to bind DNA and modulate gene expression through these repetitive elements depended on the number of consecutive GGAA motifs. These findings illustrate an unprecedented route to specificity for ETS proteins and use of microsatellites in tumorigenesis.
About a fifth of the human gene pool belongs largely either to Indo-European or Dravidic speaking people inhabiting the Indian peninsula. The 'Caucasoid share' in their gene pool is thought to be related predominantly to the Indo-European speakers. A commonly held hypothesis, albeit not the only one, suggests a massive Indo-Aryan invasion to India some 4,000 years ago [1]. Recent limited analysis of maternally inherited mitochondrial DNA (mtDNA) of Indian populations has been interpreted as supporting this concept [2] [3]. Here, this interpretation is questioned. We found an extensive deep late Pleistocene genetic link between contemporary Europeans and Indians, provided by the mtDNA haplogroup U, which encompasses roughly a fifth of mtDNA lineages of both populations. Our estimate for this split is close to the suggested time for the peopling of Asia and the first expansion of anatomically modern humans in Eurasia [4] [5] [6] [7] [8] and likely pre-dates their spread to Europe. Only a small fraction of the 'Caucasoid-specific' mtDNA lineages found in Indian populations can be ascribed to a relatively recent admixture.
In order to explore the diversity and selective signatures of duplication and deletion human copy number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single nucleotide variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.
We report a comparison of worldwide genetic variation among 255 individuals by using autosomal, mitochondrial, and Y-chromosome polymorphisms. Variation is assessed by use of 30 autosomal restriction-site polymorphisms (RSPs), 60 autosomal short-tandem-repeat polymorphisms (STRPs), 13 Alu-insertion polymorphisms and one LINE-1 element, 611 bp of mitochondrial control-region sequence, and 10 Y-chromosome polymorphisms. Analysis of these data reveals substantial congruity among this diverse array of genetic systems. With the exception of the autosomal RSPs, in which an ascertainment bias exists, all systems show greater gene diversity in Africans than in either Europeans or Asians. Africans also have the largest total number of alleles, as well as the largest number of unique alleles, for most systems. GST values are 11%-18% for the autosomal systems and are two to three times higher for the mtDNA sequence and Y-chromosome RSPs. This difference is expected because of the lower effective population size of mtDNA and Y chromosomes. A lower value is seen for Y-chromosome STRs, reflecting a relative lack of continental population structure, as a result of rapid mutation and genetic drift. Africa has higher GST values than does either Europe or Asia for all systems except the Y-chromosome STRs and Alus. All systems except the Y-chromosome STRs show less variation between populations within continents than between continents. These results are reassuring in their consistency and offer broad support for an African origin of modern human populations.
A major goal of biomedical research is to develop the capability to provide highly personalized health care. To do so, it is necessary to understand the distribution of interindividual genetic variation at loci underlying physical characteristics, disease susceptibility, and response to treatment. Variation at these loci commonly exhibits geographic structuring and may contribute to phenotypic differences between groups. Thus, in some situations, it may be important to consider these groups separately. Membership in these groups is commonly inferred by use of a proxy such as place-of-origin or ethnic affiliation. These inferences are frequently weakened, however, by use of surrogates, such as skin color, for these proxies, the distribution of which bears little resemblance to the distribution of neutral genetic variation. Consequently, it has become increasingly controversial whether proxies are sufficient and accurate representations of groups inferred from neutral genetic variation. This raises three questions: how many data are required to identify population structure at a meaningful level of resolution, to what level can population structure be resolved, and do some proxies represent population structure accurately? We assayed 100 Alu insertion polymorphisms in a heterogeneous collection of approximately 565 individuals, approximately 200 of whom were also typed for 60 microsatellites. Stripped of identifying information, correct assignment to the continent of origin (Africa, Asia, or Europe) with a mean accuracy of at least 90% required a minimum of 60 Alu markers or microsatellites and reached 99%-100% when >/=100 loci were used. Less accurate assignment (87%) to the appropriate genetic cluster was possible for a historically admixed sample from southern India. These results set a minimum for the number of markers that must be tested to make strong inferences about detecting population structure among Old World populations under ideal experimental conditions. We note that, whereas some proxies correspond crudely, if at all, to population structure, the heuristic value of others is much higher. This suggests that a more flexible framework is needed for making inferences about population structure and the utility of proxies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.