Rapid detection of identity-by-descent tracts for mega-scale datasets

Shemirani, Ruhollah; Belbin, Gillian M.; Avery, Christy L.; Gignoux, Christopher R.; Ambite, José Luis

doi:10.1101/749507

Cited by 17 publications

(19 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our analysis detected an average of 1.8 IBD segments per pair in the UK Biobank dataset within the past 50 generations. This is consistent with a previous study focusing on longer and more recent segments (average of 0.1 segments >2.9 cM per pair 66 ), but less than another recent study in a similar length range (average 1.96 segments >2 cM per pair 67 ). Taking uncertainty of the detected IBD segments into account may reconcile these estimates.…”

Section: Discussionsupporting

confidence: 91%

“…FastSMC's identification step currently relies on the GERMLINE2 genotype hashing strategy. It will be interesting to test other heuristic strategies for rapidly identifying identical segments, such as the locality-sensitive hashing strategy recently implemented in the iLASH algorithm (exhibiting 95% concordance with GERMLINE in application to real multi-ethnic data 66 ), or methods that rely on the positional Burrows-Wheeler transform (PBWT) data structure 17,67,68 . Several methods now exist to reconstruct gene genealogies in large samples [69][70][71][72] .…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations

et al. 2020

View full text Add to dashboard Cite

Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample’s birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.

show abstract

Section: Discussionsupporting

confidence: 91%

Section: Discussionmentioning

confidence: 99%

Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations

et al. 2020

View full text Add to dashboard Cite

show abstract

“…PONDEROSA takes as input IBD segment estimates (either in GERMLINE or iLASH format), as well as pairwise IBD1 and IBD2 values in a KING-formatted file (to define parent-offspring pairs) and a PLINK-formatted .fam file (to define known paths through the pedigree) (Gusev et al 2009;Shemirani and Belbin 2019;Manichaikul et al 2010). The user can also supply reported age data (to constrict possible pedigree relationships) and a PLINK .ped file (to more accurately merge IBD segments).…”

Section: Ponderosa Implementationmentioning

confidence: 99%

A rapid, accurate approach to inferring pedigrees in endogamous populations

Williams

Scelza

Daya³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Accurate reconstruction of pedigrees from genetic data remains a challenging problem. Pedigree inference algorithms are often trained only on urban European-descent families, which are comparatively 'outbred' compared to many other global populations. Relationship categories can be difficult to distinguish (e.g. half-sibships versus avuncular) without external information. Furthermore, published software cannot accommodate endogamous populations where there may be reticulations within a pedigree (i.e. inbreeding) or elevated haplotype sharing. We design a simple, rapid algorithm which initially uses only high-confidence first degree relationships to seed a machine learning step based on the number of identical by descent segments. Additionally, we define a new statistic to polarize individuals to ancestor versus descendant generation. We test our approach in a sample of 700 individuals from northern Namibia, sampled from an endogamous population. Due to a culture of concurrent relationships in this population, there is a high proportion of half-sibships. We accurately identify first through third degree relationships for all categories, including half-sibships, half-avuncular-ships etc. Accurate reconstruction of pedigrees holds promise for tracing allele frequency trajectories, improved phasing and other population genomic questions.

show abstract

“…This is because the former has a linear time algorithm [ 19 ] while the latter needs quadratic time algorithms. Among fast IBD segment detection methods, hash table-based methods [ 16 , 17 ] are typically memory intensive. RaPID [ 15 ] and hap-IBD [ 18 ] are based on the scanning algorithm of PBWT and are scaling up both in terms of run time and memory.…”

Section: Introductionmentioning

confidence: 99%

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

et al. 2021

View full text Add to dashboard Cite

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.

show abstract

Rapid detection of identity-by-descent tracts for mega-scale datasets

Cited by 17 publications

References 37 publications

Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations

Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations

A rapid, accurate approach to inferring pedigrees in endogamous populations

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

Contact Info

Product

Resources

About