SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Shen, Jiaming; Wu, Zeqiu; Lei, Dongming; Shang, Jingbo; Ren, Xiang; Han, Jiawei

doi:10.1007/978-3-319-71249-9_18

Cited by 82 publications

(106 citation statements)

References 21 publications

(29 reference statements)

Supporting

Mentioning

102

Contrasting

Order By: Relevance

“…Meanwhile, they refine the context feature pool by including only those features which are commonly shared by entities in the expanded set. Based on this philosophy, SetExpan [19] develops a context feature selection module to select quality skip-gram features and designs a rank ensemble module to select quality entities. Similarly, SetExpander [13] captures distributional similarity on five different context types and learns a classifier to combine multiple contexts using an additional labeled dataset.…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, context dependent similarity benefits set expansion tasks in that it only captures the type-indicative features of entities. We adopt the context dependent similarity function Sim(e i , e j |F ) defined in [19] using the weighted Jaccard similarity measure:…”

Section: Algorithm 1: Cross-seed Parallel Relations Clusteringmentioning

confidence: 99%

“…Then we can update the co-occurrence matrix by where flex(c) means the flexible transformation of local context c. Then we can merge those infrequent but type-indicative skip-grams together. However, one might say a trivial solution to this data sparsity issue is to exhaust a variety of window sizes, (e.g., [-2,+2],[-1,+1],[-1,0],[0,+1]), as has been done by previous studies [18,19,26]. However, this approach very likely ends up generating many skipgrams that are too general.…”

Section: Dynamicallymentioning

confidence: 99%

See 2 more Smart Citations

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Huang

Xie

et al. 2020

Proceedings of the Web Conference 2020

Self Cite

View full text Add to dashboard Cite

Given a small set of seed entities (e.g., "USA", "Russia"), corpusbased set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Algorithm 1: Cross-seed Parallel Relations Clusteringmentioning

confidence: 99%

Section: Dynamicallymentioning

confidence: 99%

See 1 more Smart Citation

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Huang

Xie

et al. 2020

Proceedings of the Web Conference 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…We focus on corpus-based approaches based on the distributional similarity hypothesis (Harris, 1954). State-of-the-art techniques return the k nearest neighbors around the seed terms as the expanded set, where terms are represented by their co-occurrence or embedding vectors in a training corpus according to different context types, such as linear window context (Pantel et al, 2009;Shi et al, 2010;Rong et al, 2016;Zaheer et al, 2017;Gyllensten and Sahlgren, 2018;Zhao et al, 2018), explicit lists (Roark and Charniak, 1998;Sarmento et al, 2007;He and Xin, 2011), coordinational patterns (Sarmento et al, 2007) and unary patterns (Rong et al, 2016;Shen et al, 2017). In this work, we generalize coordinational patterns, look at additional context types and combine multiple context-type embeddings.…”

Section: Related Workmentioning

confidence: 99%

Multi-Context Term Embeddings: the Use Case of Corpus-based Term Set Expansion

Mamou

Pereg

Wasserblat

et al. 2019

Proceedings of the 3rd Workshop on Evaluating Vector Space Representations For

View full text Add to dashboard Cite

In this paper, we present a novel algorithm that combines multi-context term embeddings using a neural classifier and we test this approach on the use case of corpus-based term set expansion. In addition, we present a novel and unique dataset for intrinsic evaluation of corpus-based term set expansion algorithms. We show that, over this dataset, our algorithm provides up to 5 mean average precision points over the best baseline.

show abstract

“…Only the n top scoring contexts will have non-zero values in W , and these get the value f ρ . This notion of weighting contexts is similar to that used in the SetExpan framework (Shen et al, 2017), although the way they use it is different (they use weighted Jaccard similarity based on context weights). Their algorithm for calculating context weights is a special case of our algorithm, with no notion of limited support penalty, that is, they use ρ = 0.…”

Section: Details Of Calculating Wmentioning

confidence: 99%

Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

2018

View full text Add to dashboard Cite

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Cited by 82 publications

References 21 publications

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Multi-Context Term Embeddings: the Use Case of Corpus-based Term Set Expansion

Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Contact Info

Product

Resources

About