A Short Survey on Taxonomy Learning from Text Corpora: Issues,
            Resources and Recent Advances

Wang, Chengyu; He, Xiaofeng; Zhou, Aoying

doi:10.18653/v1/d17-1123

Cited by 54 publications

(41 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We generate weak supervision for the hypernymy inference model from the corpus D of the input text-rich HIN. As a pioneering method, the Hearst pattern [13] has been shown to have decent precision [22,53,57]. We use this method to extract a list S =…”

Section: Weak Supervision Acquisitionmentioning

confidence: 99%

“…Distributional Method for Hypernymy Discovery. Distributional methods constitute one major line of research for hypernymy discovery [50,53] and can be adapted to hypernymy discovery from network data. Early studies proposed symmetric distributional measures for hypernymy discovery that only capture relevance between terms [20].…”

Section: Case Study: Taxonomy Constructionmentioning

confidence: 99%

See 1 more Smart Citation

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Shi

Shen

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of "context" to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.

show abstract

Section: Weak Supervision Acquisitionmentioning

confidence: 99%

Section: Case Study: Taxonomy Constructionmentioning

confidence: 99%

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Shi

Shen

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

“…Traditionally, identifying hypernymic relations from text corpora has been addressed with two main approaches: pattern-based and distributional (Wang et al, 2017). Pattern-based (path-based) methods, which provide higher precision at the price of lower coverage, exploit the co-occurrence of a hyponym and its hypernym in a textual corpus (Hearst, 1992;Navigli and Velardi, 2010;Boella and Di Caro, 2013;Flati et al, 2016;Gupta et al, 2016;Pavlick and Pasca, 2017).…”

Section: Related Workmentioning

confidence: 99%

SemEval-2018 Task 9: Hypernym Discovery

Camacho-Collados¹,

Bovi²,

Espinosa-Anke³

et al. 2018

Proceedings of the 12th International Workshop on Semantic Evaluation

View full text Add to dashboard Cite

This paper describes the SemEval 2018 Shared Task on Hypernym Discovery. We put forward this task as a complementary benchmark for modeling hypernymy, a problem which has traditionally been cast as a binary classification task, taking a pair of candidate words as input. Instead, our reformulated task is defined as follows: given an input term, retrieve (or discover) its suitable hypernyms from a target corpus. We proposed five different subtasks covering three languages (English, Spanish, and Italian), and two specific domains of knowledge in English (Medical and Music). Participants were allowed to compete in any or all of the subtasks. Overall, a total of 11 teams participated, with a total of 39 different systems submitted through all subtasks. Data, results and further information about the task can be found at https://competitions. codalab.org/competitions/17119.

show abstract

“…Evaluating the quality of an entire taxonomy is challenging due to the existence of multiple aspects that should be considered and the difficulty of obtaining gold standard [43]. Following [5,6,20], we use Ancestor -F 1 and Edдe-F 1 for taxonomy evaluation in this study.…”

Section: Evaluation Metricsmentioning

confidence: 99%

“…Existing methods mostly build taxonomies based on "is-A" relations (e.g., a "panda" is a "mammal" and a "manmal" is an "animal") [42,43,48] by first leveraging pattern-based or distributional methods to extract hypernym-hyponym term pairs and then organizing them into a tree-structured hierarchy. However, such hierarchies cannot satisfy many real-world needs due to its (1) inflexible semantics: many applications may need hierarchies carrying more flexible semantics such as "city-state-country" in a location taxonomy; and (2) limited applicability: the "universal" taxonomy so constructed is unlikely to fit diverse and user-specific application tasks.…”

Section: Introductionmentioning

confidence: 99%

HiExpan

Shen

Lei

et al. 2018

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the "is-a" relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus, and allow users to input a "seed" taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newlyexpanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies.

show abstract

A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances

Cited by 54 publications

References 79 publications

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

SemEval-2018 Task 9: Hypernym Discovery

HiExpan

Contact Info

Product

Resources

About