Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample’s birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.
1Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key 2 role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-3 scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. 4 FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations and enables 5 estimating the age of common ancestors transmitting IBD regions. We applied FastSMC to 487,409 phased samples from the 6 UK Biobank and detected the presence of ∼214 billion IBD segments transmitted by shared ancestors within the past 1, 500 7 years. We quantified time-dependent shared ancestry within and across 120 postcodes, obtaining a fine-grained picture of 8 genetic relatedness within the past two millennia in the UK. Sharing of common ancestors strongly correlates with geographic 9 distance, enabling the localization of a sample's birth coordinates from genomic data. We sought evidence of recent positive 10 selection by identifying loci with unusually strong shared ancestry within recent millennia and we detected 12 genome-wide 11 significant signals, including 7 novel loci. We found IBD sharing to be highly predictive of the sharing of ultra-rare variants in 12 exome sequencing samples from the UK Biobank. Focusing on loss-of-function variation discovered using exome sequencing, 13 we devised an IBD-based association test and detected 29 associations with 7 blood-related traits, 20 of which were not detected 14 in the exome sequencing study. These results underscore the importance of modelling distant relatedness to reveal subtle 15 population structure, recent evolutionary history, and rare pathogenic variation. 16 Introduction 17Large-scale genomic collection, through efforts like the NIH All of Us research program (1), the UK BioBank (2), and Ge-18 nomics England (3), has yielded datasets of hundreds of thousands of individuals and is expected to grow to millions in the 19 coming years. Utilizing such datasets to understand disease and health outcomes requires understanding the fine-scale genetic 20 relationships between individuals. These relationships can be characterized using short segments (less than 10 centimorgans 21[cM] in length) that are inherited identical by descent (IBD) from a common ancestor between purportedly "unrelated" pairs of 22 individuals in a dataset (4). Accurate detection of shared IBD segments has a number of downstream applications, which include 23 reconstructing the fine-scale demographic history of a population (5-8), detecting signatures of recent adaptation (9, 10), dis-24 covering phenotypic association (11, 12), estimating haplotype phase (4, 13, 14), and imputing missing genotype data (15, 16), 25 a key step in genome-wide association studies (GWAS) (17). Detection of IBD segments in millions of individuals within 26 modern biobanks poses a number of computational challenges. A...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.