2019
DOI: 10.48550/arxiv.1911.04200
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

Abstract: The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information retrieval, and many other areas. We design and implement SimilarityAtScale, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets. Our algorithm provides an efficient encoding of this problem into a multiplication of sparse matrices. Both the encoding and sparse matrix product are performed in a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
3

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 64 publications
0
3
0
Order By: Relevance
“…These works target popular graph algorithms such as BFS or PageRank. Multiplication of matrices and vectors [26,87] has also been addressed in the context of FPGAs [55,56,92,126,134,151]; these efforts could be used for energy-efficient and high-performance graph analytics on FPGAs due to the possibility of expressing graph algorithms in the language of linear algebra [82]. Our work differs from these designs as we focus on the problem of finding graph matchings.…”
Section: Graph Processing On Fpgasmentioning
confidence: 99%
“…These works target popular graph algorithms such as BFS or PageRank. Multiplication of matrices and vectors [26,87] has also been addressed in the context of FPGAs [55,56,92,126,134,151]; these efforts could be used for energy-efficient and high-performance graph analytics on FPGAs due to the possibility of expressing graph algorithms in the language of linear algebra [82]. Our work differs from these designs as we focus on the problem of finding graph matchings.…”
Section: Graph Processing On Fpgasmentioning
confidence: 99%
“…One could also investigate topology-aware or routing-aware data distribution for graph streaming, especially together with recent high-performance network topologies [29], [130] and routing [37], [144], [88]. Finally, ensuring speedups due to different data modeling abstractions, such as the algebraic abstraction [126], [33], [34], [136], may be a promising direction.…”
Section: Challengesmentioning
confidence: 99%
“…Large graphs are a basis of many problems in machine learning, medicine, social network analysis, computational sciences, and others [15,25,106]. The growing graph sizes, reaching one trillion edges in 2015 (the Facebook social graph [48]) and 12 trillion edges in 2018 (the Sogou webgraph [101]), require unprecedented amounts of compute power, storage, and energy.…”
Section: Introductionmentioning
confidence: 99%