Multi-value Classification of Very Short Texts

Heß, Andreas; Dopichaj, Philipp; Maaß, Christian

doi:10.1007/978-3-540-85845-4_9

Cited by 7 publications

(11 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most prominent approach to adapt classifiers for multi-labeling is binary relevance [26,28]. Other options include the chaining [21] as well as stacking [9,27] of classifiers. While the former is not well-suited for high amounts of considered labels, we also include a variation of the latter idea in our comparison.…”

Section: Related Workmentioning

confidence: 99%

“…As meta-classifiers, we use decision trees with Gini impurity as splitting criterion. To limit complexity, we generate training data only for those meta-classifiers, whose class is among the top 30 of the base-classifier's ranking [9]. We use this decision tree module (abbreviated with the suffix *DT) as an alternative to hard cut-offs in Learning to Rank (see Section 2, and the fixed thresholds in multi-layer perceptrons (see Section 3.2.2).…”

Section: Multi-label Adaptionmentioning

confidence: 99%

“…We use this decision tree module (abbreviated with the suffix *DT) as an alternative to hard cut-offs in Learning to Rank (see Section 2, and the fixed thresholds in multi-layer perceptrons (see Section 3.2.2). For comparison with the original work of Heß et al [9], we also consider Rocchio as a base-classifier. We furthermore experiment with applying the decision tree module on top of binary-relevance logistic regression.…”

Section: Multi-label Adaptionmentioning

confidence: 99%

“…We used the annotations provided by the indexing service because it is reasonable to expect that they are more consistent and of higher quality (cf. [9]). As for the Reuters dataset, we chose a random subset of 100, 000 documents containing both full-text and titles.…”

Section: Datasetsmentioning

confidence: 99%

See 3 more Smart Citations

Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

Galke

Mai

Schelten

et al. 2017

Proceedings of the Knowledge Capture Conference

View full text Add to dashboard Cite

A significant part of the largest Knowledge Graph today, the Linked Open Data cloud, consists of metadata about documents such as publications, news reports, and other media articles. While the widespread access to the document metadata is a tremendous advancement, it is yet not so easy to assign semantic annotations and organize the documents along semantic concepts. Providing semantic annotations like concepts in SKOS thesauri is a classical research topic, but typically it is conducted on the full-text of the documents. For the first time, we offer a systematic comparison of classification approaches to investigate how far semantic annotations can be conducted using just the metadata of the documents such as titles published as labels on the Linked Open Data cloud. We compare the classifications obtained from analyzing the documents' titles with semantic annotations obtained from analyzing the full-text. Apart from the prominent text classification baselines kNN and SVM, we also compare recent techniques of Learning to Rank and neural networks and revisit the traditional methods logistic regression, Rocchio, and Naive Bayes. The results show that across three of our four datasets, the performance of the classifications using only titles reaches over 90% of the quality compared to the classification performance when using the full-text. Thus, conducting document classification by just using the titles is a reasonable approach for automated semantic annotation and opens up new possibilities for enriching Knowledge Graphs.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Multi-label Adaptionmentioning

confidence: 99%

Section: Multi-label Adaptionmentioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

See 2 more Smart Citations

Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

Galke

Mai

Schelten

et al. 2017

Proceedings of the Knowledge Capture Conference

View full text Add to dashboard Cite

show abstract

“…comments, reviews or web searches) has been intensively studied since 2008 [15,46,27]. There are great benefits in being able to analyse short texts, for example, advertisers might be interested in the sentiment of product reviews on e-commerce sites to more efficiently pair marketing material to content.…”

Section: Introductionmentioning

confidence: 99%

N-Gram Representations For Comment Filtering

Brand

Kroon

Merwe

et al. 2015

Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologi

View full text Add to dashboard Cite

Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of Ngrams as features for short text classification, and compares it to manual feature design techniques that have been popular in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

show abstract