Homologous long segments along the genomes of close or remote relatives that are identical by descent (IBD) from a common ancestor provide clues for recent events in human genetics. We set out to extensively map such IBD segments in large cohorts and investigate their distribution within and across different populations. We report analysis of several data sets, demonstrating that IBD is more common than expected by naïve models of population genetics. We show that the frequency of IBD pairs is population dependent and can be used to cluster individuals into populations, detect a homogeneous subpopulation within a larger cohort, and infer bottleneck events in such a subpopulation. Specifically, we show that Ashkenazi Jewish individuals are all connected through transitive remote family ties evident by sharing of 50 cM IBD to a publicly available data set of less than 400 individuals. We further expose regions where long-range haplotypes are shared significantly more often than elsewhere in the genome, observed across multiple populations, and enriched for common long structural variation. These are inconsistent with recent relatedness and suggest ancient common ancestry, with limited recombination between haplotypes.
PurposeTo identify the genetic origins of autosomal recessive congenital cataracts (arCC) in the Pakistani population.MethodsBased on the hypothesis that most arCC patients in consanguineous families in the Punjab areas of Pakistan should be homozygous for causative mutations, affected individuals were screened for homozygosity of nearby highly informative microsatellite markers and then screened for pathogenic mutations by DNA sequencing. A total of 83 unmapped consanguineous families were screened for mutations in 33 known candidate genes.ResultsPatients in 32 arCC families were homozygous for markers near at least 1 of the 33 known CC genes. Sequencing the included genes revealed homozygous cosegregating sequence changes in 10 families, 2 of which had the same variation. These included five missense, one nonsense, two frame shift, and one splice site mutations, eight of which were novel, in EPHA2, FOXE3, FYCO1, TDRD7, MIP, GALK1, and CRYBA4.ConclusionsThe above results confirm the usefulness of homozygosity mapping for identifying genetic defects underlying autosomal recessive disorders in consanguineous families. In our ongoing study of arCC in Pakistan, including 83 arCC families that underwent homozygosity mapping, 3 mapped using genome-wide linkage analysis in unpublished data, and 30 previously reported families, mutations were detected in approximately 37.1% (43/116) of all families studied, suggesting that additional genes might be responsible in the remaining families. The most commonly mutated gene was FYCO1 (14%), followed by CRYBB3 (5.2%), GALK1 (3.5%), and EPHA2 (2.6%). This provides the first comprehensive description of the genetic architecture of arCC in the Pakistani population.
The detection of genetic segments of Identical by Descent (IBD) in Genome-Wide Association Studies has proven successful in pinpointing genetic relatedness between reportedly unrelated individuals and leveraging such regions to shortlist candidate genes. These techniques depend on high-density genotyping arrays and their effectiveness in diverse sequence data is largely unknown. Due to decreasing costs and increasing effectiveness of high throughput techniques for whole-exome sequencing, an influx of exome sequencing data has become available. Studies using exomes and IBD-detection methods within known pedigrees have shown that IBD can be useful in finding hidden genetic candidates where known relatives are available. We set out to examine the viability of using IBD-detection in whole exome sequencing data in population-wide studies. In doing so, we extend GERMLINE, a method to detect IBD from exome sequencing data by finding small slices of matching alleles between pairs of individuals and extending them into full IBD segments. This algorithm allows for efficient population-wide detection in dense data. We apply this algorithm to a cohort of Crohn's Disease cases where whole-exome and GWAS array data is available. We confirm that GWAS-based detected segments are highly accurate and predictive of underlying shared variation. Where segments inferred from GWAS are expected to be of high accuracy, we compare exome-based detection accuracy of multiple detection strategies. We find detection accuracy to be prohibitively low in all assessments, both in terms of segment sensitivity and specificity. Even after isolating relatively long segments beyond 10cM, exome-based detection continued to offer poor specificity/sensitivity tradeoffs. We hypothesize that the variable coverage and platform biases of exome capture account for this decreased accuracy and look toward whole genome sequencing data as a higher quality source for detecting population-wide IBD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.