We investigated the influences of admixture and consanguinity on the genetic architecture of disease by generating a database of variants derived from exome sequencing (ES) of 853 unrelated Turkish (TK) individuals with different disease phenotypes. We observed that TK genomes are more similar to Europeans with 69.3% of the unique variants (N = 356,613) not present in the Greater Middle Eastern variome.We found higher inbreeding coefficient values in the TK cohort correlating with a larger median span of long-sized (>1.606 Mb) runs of homozygosity (ROH). We show that long-sized ROHs arose from recently configured haplotypes and are enriched for rare homozygous deleterious variants. Such haplotypes, and the combinatorial effect of their embedded ultra-rare variants, provide the most explanatory molecular diagnoses for the TK individuals' observed disease traits. Such haplotype evolution results in homozygosity of disease associated haplotypes due to identity-by-descent in a family or extended clan.Genomics hypothesis (Lupski et al., 2011), the notion that novel rare variant alleles arising within a patient, or the recent past generations of a family or more extended clan, significantly contribute to disease in populations. These data underscore the value of studying alternative population substructures that present with high levels of both admixture and consanguinity to elucidate the molecular mechanisms and genetic and genomic architecture underlying disease in populations.
Results
The Turkish (TK) variomeAfter removing related individuals from the Baylor Hopkins Center for Mendelian Genomics (BHCMG) database, exomes from 4,933 unrelated individuals remained for further analyses. From this data set, we used principal component analysis (PCA) to investigate the population structure between our TK and non-TK cohorts in comparison to the African, Asian, and European population samples from the 1000 Genomes Project. The first main principal component axis (PC1) separated the African samples from the Asian, the European populations and the TK and non-TK cohorts of the BHCMG samples ( Figure 1A). The second main principal component axis (PC2) further separated the Asian samples from the European and the TK and non-TK groups of the BHCMG cohort. These studies showed that the TK genomes were distinct from the African and Asian populations, and more similar to the variomes of European samples compared to the non-TK samples that spread out across different populations ( Figure 1A and Supplementary Figure 1).