2017 Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments (ALENEX) 2017
DOI: 10.1137/1.9781611974768.12
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Set Intersection Counting Algorithm for Text Similarity Measures

Abstract: Set intersection counting appears as a subroutine in many techniques used in natural language processing, in which similarity is often measured as a function of document cooccurence counts between pairs of noun phrases or entities. Such techniques include clustering of text phrases and named entities, topic labeling, entity disambiguation, sentiment analysis, and search for synonyms.These techniques can have real-time constraints that require very fast computation of thousands of set intersection counting quer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 30 publications
(44 reference statements)
0
2
0
Order By: Relevance
“…Fast algorithms for intersection of sets and sequences are described in [8,6,9] while [5] works for sorted sequences. For text similarity, efficient set intersection algorithm is discussed in [25].…”
Section: Related Workmentioning
confidence: 99%
“…Fast algorithms for intersection of sets and sequences are described in [8,6,9] while [5] works for sorted sequences. For text similarity, efficient set intersection algorithm is discussed in [25].…”
Section: Related Workmentioning
confidence: 99%
“…The bottlenecks are the random walk betweenness centrality, and the PMI API calls, as the remaining features can be computed in seconds. However, in principle, a fast batch PMI implementation [18] can replace individual API calls, so the main bottleneck is the random walk betweenness centrality feature.…”
Section: Evaluating Our Ranking Modelmentioning
confidence: 99%