While genetic relatedness, usually manifested as segments identical by descent (IBD), is ubiquitous in modern large biobanks, current IBD detection methods are not efficient at such a scale. Here, we describe an efficient method, RaPID, for detecting IBD segments in a panel with phased haplotypes. RaPID achieves a time and space complexity linear to the input size and the number of reported IBDs. With simulation, we showed that RaPID is orders of magnitude faster than existing method while offering competitive power and accuracy. In UK Biobank, RaPID identified 3,335,807 IBDs with a lenght ≥ 10 cM among 223,507 male X chromosomes in 11 min. Electronic supplementary material The online version of this article (10.1186/s13059-019-1754-8) contains supplementary material, which is available to authorized users.
Background The genealogical histories of individuals within populations are of interest to studies aiming both to uncover detailed pedigree information and overall quantitative population demographic histories. However, the analysis of quantitative details of individual genealogical histories has faced challenges from incomplete available pedigree records and an absence of objective and quantitative details in pedigree information. Although complete pedigree information for most individuals is difficult to track beyond a few generations, it is possible to describe a person’s genealogical history using their genetic relatives revealed by identity by descent (IBD) segments—long genomic segments shared by two individuals within a population, which are identical due to inheritance from common ancestors. When modern biobanks collect genotype information for a significant fraction of a population, dense genetic connections of a person can be traced using such IBD segments, offering opportunities to characterize individuals in the context of the underlying populations. Here, we conducted an individual-centric analysis of IBD segments among the UK Biobank participants that represent 0.7% of the UK population. Results We made a high-quality call set of IBD segments over 5 cM among all 500,000 UK Biobank participants. On average, one UK individual shares IBD segments with 14,000 UK Biobank participants, which we refer to as “relatives.” Using these segments, approximately 80% of a person’s genome can be imputed. We subsequently propose genealogical descriptors based on the genetic connections of relative cohorts of individuals sharing at least one IBD segment and show that such descriptors offer important information about one’s genetic makeup, personal genealogical history, and social behavior. Through analysis of relative counts sharing segments at different lengths, we identified a group, potentially British Jews, who has a distinct pattern of familial expansion history. Finally, using the enrichment of relatives in one’s neighborhood, we identified regional variations of personal preference favoring living closer to one’s extended families. Conclusions Our analysis revealed genetic makeup, personal genealogical history, and social behaviors at the population scale, opening possibilities for further studies of individual’s genetic connections in biobank data.
In the recent biobank era of genetics, the problem of identical-by-descent (IBD) segment detection received renewed interest, as IBD segments in large cohorts offer unprecedented opportunities in the study of population and genealogical history, as well as genetic association of long haplotypes. While a new generation of efficient methods for IBD segment detection becomes available, direct comparison of these methods is difficult: existing benchmarks were often evaluated in different datasets, with some not openly accessible; methods benchmarked were run under suboptimal parameters; and benchmark performance metrics were not defined consistently. Here, we developed a comprehensive and completely open-source evaluation of the power, accuracy, and resource consumption of these IBD segment detection methods using realistic population genetic simulations with various settings. Our results pave the road for fair evaluation of IBD segment detection methods and provide an practical guide for users.
When modern biobanks collect genotype information for a significant fraction of a population, dense genetic connections of a person can be traced using identity by descent (IBD) segments. These connections offer opportunities to characterize individuals in the context of the underlying populations. Here, we conducted an individual-centric analysis of IBDs among the UK Biobank participants that represent 0.7% of the UK population. On average, one UK individual shares IBDs over 5 cM with 14,000 UK Biobank participants, which we refer to as "cousins". Using these segments, approximately 80% of a person's genome can be reconstructed. Also, using changes of cousin counts sharing IBDs at different lengths, we identified a group, potentially British Jews, who has a distinct pattern of familial expansion history. Finally, using the enrichment of cousins in one's neighborhood, we identified regional variations of personal preference favoring living closer to one's extended families. In summary, our analysis revealed genetic makeup, personal genealogical history, and social behaviors at population scale, opening possibilities for further studies of individual's genetic connections in biobank data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.