Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 2 - EMNLP '09 2009
DOI: 10.3115/1699571.1699635
|View full text |Cite
|
Sign up to set email alerts
|

Web-scale distributional similarity and entity set expansion

Abstract: Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
129
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 175 publications
(129 citation statements)
references
References 44 publications
0
129
0
Order By: Relevance
“…MapReduce has been used for computing similarity between words or objects on the Web [1,9,20]. Several works discuss using CPU and GPU environments.…”
Section: Related Workmentioning
confidence: 99%
“…MapReduce has been used for computing similarity between words or objects on the Web [1,9,20]. Several works discuss using CPU and GPU environments.…”
Section: Related Workmentioning
confidence: 99%
“…For example, "dog" and "cat" should have a high peer similarity score. Following existing work (Hearst, 1992;Kozareva et al, 2008;Shi et al, 2010;Agirre et al, 2009;Pantel et al, 2009), we built a peer similarity graph containing about 40.5 million nodes and 1.33 billion edges.…”
Section: Building Term Clustersmentioning
confidence: 99%
“…In such work, a word is represented by the distribution of other words that co-occur with it. Distributional representations of words have been successfully used in many language processing tasks such as entity set expansion (Pantel et al, 2009), part-of-speech (POS) tagging and chunking (Huang and Yates, 2009), ontology learning (Curran, 2005), computing semantic textual similarity (Besançon et al, 1999), and lexical inference (Kotlerman et al, 2012).…”
Section: Introductionmentioning
confidence: 99%