Large-scale hierarchical text classification without labelled data

Ha-Thuc, Viet; Renders, Jean-Michel

doi:10.1145/1935826.1935919

Cited by 15 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, as big data analysis continues to be a major research trend, unsupervised or semisupervised learning is playing an important role in TC (Gliozzo, Strapparava, & Dagan, 2005;Ha-Thuc & Renders, 2011;Ko & Seo, 2009). Many new research issues remain for TC; however, these also are based on the term-weighting schemes of supervisedlearning-based TC.…”

Section: Related Workmentioning

confidence: 99%

A new term‐weighting scheme for text classification using the odds of positive and negative class probabilities

2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

The peculiarity of text classification that differs most from information retrieval is the existence of class information. Therefore, this paper proposes a new term weighting scheme that utilizes class information using positive and negative class distributions. As a result, the proposed scheme, log tf.TRR, consistently performs better than other schemes using class information, as well as traditional schemes such as tf.idf.

show abstract

Section: Related Workmentioning

confidence: 99%

A new term‐weighting scheme for text classification using the odds of positive and negative class probabilities

2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…• Thematic annotation with terms from the IPTC hierarchy [5] • Thematic clustering of semantically homogeneous document fragments (hereafter, segments) into classes corresponding to the incident they report on.…”

Section: The Sync3 Domainmentioning

confidence: 99%

“…5 The framework is fully extensible and configurable with respect to storage mechanisms, inference engines, RDF file formats, query result formats, and query languages.…”

Section: Sesame Sailsmentioning

confidence: 99%

POWDER and the multi million-triple store

Konstantopoulos

Archer

2011

Proceedings of the International Workshop on Semantic Web Information Management

View full text Add to dashboard Cite

In this paper we present and discuss the implementation and deployment of the Protocol for Web Description Resources (POWDER) W3C Recommendation for a large RDF repository containing millions of triples. POWDER enables taking advantage of natural groupings of URIs and their reflection on the denoted things' properties; our application implements a POWDER service that intercepts the API between the RDF store and the inference layer above it and provides annotations that appear as explicit statements to the inference service. The approach is tested on a multi million-triple store of news documents and events, where it achieves dramatic savings on storage space without impacting querying time.

show abstract

“…To solve this problem [6] developed a system to hierarchically classify unlabelled data. As already mentioned, classifying data manually is extremely expensive and slows the classification process down.…”

Section: Introductionmentioning

confidence: 99%

“…Additionally, it grows to be an inefficient approach as with larger datasets the number of categories can exceed to thousands, of which each needs to be represented by a sufficient number of labelled documents. The system solves this issue by using ontological knowledge and by searching 'pseudo-relevant documents on the Web' [6]. With the ontology it is possible to create a hierarchical model including the context of ancestors among different classes.…”

Section: Introductionmentioning

confidence: 99%

Cloud-based Textual Analysis as a Basis for Document Classification

Weir

Owoeye

Oberacker

et al. 2018

2018 International Conference on High Performance Computing &Amp; Simulation (HPCS)

View full text Add to dashboard Cite

Growing trends in data mining and developments in machine learning, have encouraged interest in analytical techniques that can contribute insights on data characteristics. The present paper describes an approach to textual analysis that generates extensive quantitative data on target documents, with output including frequency data on tokens, types, parts-of-speech and word n-grams. These analytical results enrich the available source data and have proven useful in several contexts as a basis for automating manual classification tasks. In the following, we introduce the Posit textual analysis toolset and detail its use in data enrichment as input to supervised learning tasks, including automating the identification of extremist Web content. Next, we describe the extension of this approach to Arabic language. Thereafter, we recount the move of these analytical facilities from local operation to a Cloud-based service. This transition, affords easy remote access for other researchers seeking to explore the application of such data enrichment to their own text-based data sets.

show abstract

Large-scale hierarchical text classification without labelled data

Cited by 15 publications

References 23 publications

A new term‐weighting scheme for text classification using the odds of positive and negative class probabilities

A new term‐weighting scheme for text classification using the odds of positive and negative class probabilities

POWDER and the multi million-triple store

Cloud-based Textual Analysis as a Basis for Document Classification

Contact Info

Product

Resources

About