Juba Nait Saada scite author profile

Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample’s birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.

show abstract

Identity-by-descent detection across 487,409 British samples reveals fine-scale population structure, evolutionary history, and trait associations

Saada

Kalantzis

Shyr

et al. 2020

Preprint

View full text Add to dashboard Cite

1Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key 2 role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-3 scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. 4 FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations and enables 5 estimating the age of common ancestors transmitting IBD regions. We applied FastSMC to 487,409 phased samples from the 6 UK Biobank and detected the presence of ∼214 billion IBD segments transmitted by shared ancestors within the past 1, 500 7 years. We quantified time-dependent shared ancestry within and across 120 postcodes, obtaining a fine-grained picture of 8 genetic relatedness within the past two millennia in the UK. Sharing of common ancestors strongly correlates with geographic 9 distance, enabling the localization of a sample's birth coordinates from genomic data. We sought evidence of recent positive 10 selection by identifying loci with unusually strong shared ancestry within recent millennia and we detected 12 genome-wide 11 significant signals, including 7 novel loci. We found IBD sharing to be highly predictive of the sharing of ultra-rare variants in 12 exome sequencing samples from the UK Biobank. Focusing on loss-of-function variation discovered using exome sequencing, 13 we devised an IBD-based association test and detected 29 associations with 7 blood-related traits, 20 of which were not detected 14 in the exome sequencing study. These results underscore the importance of modelling distant relatedness to reveal subtle 15 population structure, recent evolutionary history, and rare pathogenic variation. 16 Introduction 17Large-scale genomic collection, through efforts like the NIH All of Us research program (1), the UK BioBank (2), and Ge-18 nomics England (3), has yielded datasets of hundreds of thousands of individuals and is expected to grow to millions in the 19 coming years. Utilizing such datasets to understand disease and health outcomes requires understanding the fine-scale genetic 20 relationships between individuals. These relationships can be characterized using short segments (less than 10 centimorgans 21[cM] in length) that are inherited identical by descent (IBD) from a common ancestor between purportedly "unrelated" pairs of 22 individuals in a dataset (4). Accurate detection of shared IBD segments has a number of downstream applications, which include 23 reconstructing the fine-scale demographic history of a population (5-8), detecting signatures of recent adaptation (9, 10), dis-24 covering phenotypic association (11, 12), estimating haplotype phase (4, 13, 14), and imputing missing genotype data (15, 16), 25 a key step in genome-wide association studies (GWAS) (17). Detection of IBD segments in millions of individuals within 26 modern biobanks poses a number of computational challenges. A...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Juba Nait Saada

Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations

Identity-by-descent detection across 487,409 British samples reveals fine-scale population structure, evolutionary history, and trait associations

Contact Info

Product

Resources

About