Proceedings of the 14th ACM International Conference on Information and Knowledge Management 2005
DOI: 10.1145/1099554.1099703
|View full text |Cite
|
Sign up to set email alerts
|

Taxonomies by the numbers

Abstract: In this paper, we describe a system for the construction of taxonomies which yield high accuracies with automated categorization systems, even on Web and intranet documents. In particular, we describe the way in which measurement of five key features of the system can be used to predict when categories are sufficiently well defined to yield high accuracy categorization. We describe the use of this system to construct a large (8800category) general-purpose taxonomy and categorization system.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
1
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(4 citation statements)
references
References 21 publications
(12 reference statements)
0
1
0
Order By: Relevance
“…Proposed solution uses the cosine similarity function [19] to check the similarity between the extracted concepts representing two documents. The cosine similarity function is shown in T1sims,d=t()s,dwst·wdtfalse∑tsws|tnormal2·false∑tswd|tnormal2, where s , d are the two documents, w s ( t ) is the weight of term t in the s document, and w d ( t ) is the weight of term t in the d document.…”
Section: Overall Architecture Of the Proposed Methodologymentioning
confidence: 99%
“…Proposed solution uses the cosine similarity function [19] to check the similarity between the extracted concepts representing two documents. The cosine similarity function is shown in T1sims,d=t()s,dwst·wdtfalse∑tsws|tnormal2·false∑tswd|tnormal2, where s , d are the two documents, w s ( t ) is the weight of term t in the s document, and w d ( t ) is the weight of term t in the d document.…”
Section: Overall Architecture Of the Proposed Methodologymentioning
confidence: 99%
“…Several works address taxonomy construction by exploiting well-known data mining techniques. Mostly proposed algorithms exploited clustering techniques to provide a well-founded structuring of concepts of interest (Dhillon & Modha, 2001;Clifton et al, 2004;Gates, Teiken, & Cheng, 2005;Ienco & Meo, 2008). The mostly used techniques are based on hierarchical clustering, which produces a set of nested clusters organized as a hierarchical tree, called dendrogram.…”
Section: Previous Workmentioning
confidence: 99%
“…The mostly used techniques are based on hierarchical clustering, which produces a set of nested clusters organized as a hierarchical tree, called dendrogram. The most relevant application context in which clustering techniques have been adopted to address taxonomy construction is the context of textual data analysis (Dhillon & Modha, 2001;Clifton et al, 2004;Gates et al, 2005;Ienco & Meo, 2008;Hofmann, 1999;Hatzivassiloglou, Gravano, & Maganti, 2000). However, the focus of our approach is different from the aforementioned one.…”
Section: Previous Workmentioning
confidence: 99%
“…Kiritchenko et al [32] dealt with hierarchical categorization, and introduced the notion of consistent classification. Gates and Teiken [20] described a system for the construction of taxonomies which yielded high accuracy for automated categorization systems. Dekel et al [15] formulated the hierarchical classification task as an optimization problem with varying margin constraints, and described new online and batch algorithms for solving it.…”
Section: Text Classification or Categorizationmentioning
confidence: 99%