Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 2011
DOI: 10.1145/1935826.1935919
|View full text |Cite
|
Sign up to set email alerts
|

Large-scale hierarchical text classification without labelled data

Abstract: The traditional machine learning approaches for text classification often require labelled data for learning classifiers. However, when applied to large-scale classification involving thousands of categories, creating such labelled data is extremely expensive since typically the data is manually labelled by humans. Motivated by this, we propose a novel approach for large-scale hierarchical text classification which does not require any labelled data. We explore a perspective where the meaning of a category is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…In particular, as big data analysis continues to be a major research trend, unsupervised or semisupervised learning is playing an important role in TC (Gliozzo, Strapparava, & Dagan, 2005;Ha-Thuc & Renders, 2011;Ko & Seo, 2009). Many new research issues remain for TC; however, these also are based on the term-weighting schemes of supervisedlearning-based TC.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, as big data analysis continues to be a major research trend, unsupervised or semisupervised learning is playing an important role in TC (Gliozzo, Strapparava, & Dagan, 2005;Ha-Thuc & Renders, 2011;Ko & Seo, 2009). Many new research issues remain for TC; however, these also are based on the term-weighting schemes of supervisedlearning-based TC.…”
Section: Related Workmentioning
confidence: 99%
“…• Thematic annotation with terms from the IPTC hierarchy [5] • Thematic clustering of semantically homogeneous document fragments (hereafter, segments) into classes corresponding to the incident they report on.…”
Section: The Sync3 Domainmentioning
confidence: 99%
“…5 The framework is fully extensible and configurable with respect to storage mechanisms, inference engines, RDF file formats, query result formats, and query languages.…”
Section: Sesame Sailsmentioning
confidence: 99%
“…To solve this problem [6] developed a system to hierarchically classify unlabelled data. As already mentioned, classifying data manually is extremely expensive and slows the classification process down.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, it grows to be an inefficient approach as with larger datasets the number of categories can exceed to thousands, of which each needs to be represented by a sufficient number of labelled documents. The system solves this issue by using ontological knowledge and by searching 'pseudo-relevant documents on the Web' [6]. With the ontology it is possible to create a hierarchical model including the context of ancestors among different classes.…”
Section: Introductionmentioning
confidence: 99%