2019
DOI: 10.1101/749507
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rapid detection of identity-by-descent tracts for mega-scale datasets

Abstract: The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, IBD by LocAlity-Sensitive Hashing, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to the current leading method and sp… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
18
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(19 citation statements)
references
References 37 publications
1
18
0
Order By: Relevance
“…Our analysis detected an average of 1.8 IBD segments per pair in the UK Biobank dataset within the past 50 generations. This is consistent with a previous study focusing on longer and more recent segments (average of 0.1 segments >2.9 cM per pair 66 ), but less than another recent study in a similar length range (average 1.96 segments >2 cM per pair 67 ). Taking uncertainty of the detected IBD segments into account may reconcile these estimates.…”
Section: Discussionsupporting
confidence: 91%
See 1 more Smart Citation
“…Our analysis detected an average of 1.8 IBD segments per pair in the UK Biobank dataset within the past 50 generations. This is consistent with a previous study focusing on longer and more recent segments (average of 0.1 segments >2.9 cM per pair 66 ), but less than another recent study in a similar length range (average 1.96 segments >2 cM per pair 67 ). Taking uncertainty of the detected IBD segments into account may reconcile these estimates.…”
Section: Discussionsupporting
confidence: 91%
“…FastSMC's identification step currently relies on the GERMLINE2 genotype hashing strategy. It will be interesting to test other heuristic strategies for rapidly identifying identical segments, such as the locality-sensitive hashing strategy recently implemented in the iLASH algorithm (exhibiting 95% concordance with GERMLINE in application to real multi-ethnic data 66 ), or methods that rely on the positional Burrows-Wheeler transform (PBWT) data structure 17,67,68 . Several methods now exist to reconstruct gene genealogies in large samples [69][70][71][72] .…”
Section: Discussionmentioning
confidence: 99%
“…PONDEROSA takes as input IBD segment estimates (either in GERMLINE or iLASH format), as well as pairwise IBD1 and IBD2 values in a KING-formatted file (to define parent-offspring pairs) and a PLINK-formatted .fam file (to define known paths through the pedigree) (Gusev et al 2009;Shemirani and Belbin 2019;Manichaikul et al 2010). The user can also supply reported age data (to constrict possible pedigree relationships) and a PLINK .ped file (to more accurately merge IBD segments).…”
Section: Ponderosa Implementationmentioning
confidence: 99%
“…This is because the former has a linear time algorithm [ 19 ] while the latter needs quadratic time algorithms. Among fast IBD segment detection methods, hash table-based methods [ 16 , 17 ] are typically memory intensive. RaPID [ 15 ] and hap-IBD [ 18 ] are based on the scanning algorithm of PBWT and are scaling up both in terms of run time and memory.…”
Section: Introductionmentioning
confidence: 99%