Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) 2016
DOI: 10.18653/v1/s16-1206
|View full text |Cite
|
Sign up to set email alerts
|

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

Abstract: We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and Hearst-style lexicosyntactic patterns from domain-specific texts obtained via language model based focused crawling. Extracted taxonomies are evaluated on English, Dutch, French and Italian for three domains each (Food, Environment and Science). Evaluations against a gold stan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
83
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 58 publications
(83 citation statements)
references
References 22 publications
0
83
0
Order By: Relevance
“…TAXI The methods for hypernym identification used in the TAXonomy Induction system (TAXI) rely on two sources of evidence: substring matching Table 4: Manual evaluation of 100 (at most) randomly selected novel relations based on precision for English and Hearst-like patterns (Panchenko et al, 2016). The Hearst patterns for all languages are extracted from Wikipedia and from focused crawls with seed pages that are Wikipedia pages.…”
Section: Participants and Resultsmentioning
confidence: 99%
“…TAXI The methods for hypernym identification used in the TAXonomy Induction system (TAXI) rely on two sources of evidence: substring matching Table 4: Manual evaluation of 100 (at most) randomly selected novel relations based on precision for English and Hearst-like patterns (Panchenko et al, 2016). The Hearst patterns for all languages are extracted from Wikipedia and from focused crawls with seed pages that are Wikipedia pages.…”
Section: Participants and Resultsmentioning
confidence: 99%
“…In both task, two pattern-based methods (i.e., IN-RIASAC (Grefenstette, 2015) in TExEval and TAXI (Panchenko et al, 2016) in TExEval-2) consistently outperform others. INRIASAC uses frequency-based co-occurrence statistics, and substring inclusion heuristics to extract a set of hypernyms for hyponyms.…”
Section: Results Analysis and Discussionmentioning
confidence: 99%
“…• Normalized Frequency Diff (n d ): Similar to [28], this feature is an asymmetric hypernymy score based on frequency counts. We compute n d (x i , x j ) by first normalizing the frequency counts obtained (i.e., the counts in E k (x i )) for term x i as follows:…”
Section: Initial Subsequences Mortadella→sausage→meat→food Laksa→soupmentioning
confidence: 99%
“…Past approaches to taxonomy induction from scratch either assume the availability of a clean input vocabulary [28] or employ a time-consuming manual cleaning step over a noisy input vocabulary [38]. For example, Figure 1 shows the pipeline of a typical taxonomy induction approach from a domain corpus [38].…”
Section: Introductionmentioning
confidence: 99%