Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management 2014
DOI: 10.1145/2661829.2662060
|View full text |Cite
|
Sign up to set email alerts
|

On Efficient Meta-Level Features for Effective Text Classification

Abstract: This paper addresses the problem of automatically learning to classify texts by exploiting information derived from meta-level features (i.e., features derived from the original bag-of-words representation). We propose new meta-level features derived from the class distribution, the entropy and the within-class cohesion observed in the k nearest neighbors of a given test document x, as well as from the distribution of distances of x to these neighbors. The set of proposed features is capable of transforming th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…As described below in Sections 3.1 and 3.2 below, we use binbased features to capture the characteristics of the differences between vectors and the distribution of word embeddings. This is similar to, e.g., [11] where meta-level features are proposed, in a text classification setting using the kNN algorithm, to exploit the distribution of the nearest neighbour similarities and the withinclass cohesion.…”
Section: Meta-level Featuresmentioning
confidence: 86%
“…As described below in Sections 3.1 and 3.2 below, we use binbased features to capture the characteristics of the differences between vectors and the distribution of word embeddings. This is similar to, e.g., [11] where meta-level features are proposed, in a text classification setting using the kNN algorithm, to exploit the distribution of the nearest neighbour similarities and the withinclass cohesion.…”
Section: Meta-level Featuresmentioning
confidence: 86%
“…In other words, the classifier used to predict the class of documents was not used in the construction phase of the document representation. In terms of text representations, we considered three alternatives, namely traditional term-weighting alternatives (term frequency-inverted document frequency [TFIDF]); weighting based on word and character (n-gram) frequency; and recent representations based on meta-features, which capture statistical information from a document's neighborhood and have obtained state-of-the-art effectiveness in recent benchmarks [35][36][37][38][39].…”
Section: Automatic Text Classification Methodsmentioning
confidence: 99%
“…In contrast, it is heavily dependent on the specialists and the coverage of the rules on the text expressions. More details about each of the exploited algorithms are provided in Multimedia Appendix 4 [3,35,37,39,[41][42][43][44][45][50][51][52][53][54][55][56][57][58][59][60][61][62][63].…”
Section: Automatic Text Classification Methodsmentioning
confidence: 99%
“…There is an ongoing debate in the research community if additional features can improve the simple bag-of-words model. Some authors find significant improvements (Canuto et al 2014), and others assert that NLP-derived features are about as good as bag-of-words (Godbole 2006). It is a fact that due to the predictive power of bag-of-words and bag-of-n-grams and their ease-of-use, especially in the predominant case of sentiment analysis, little research has been devoted to the investigation of more complex, NLP-based features.…”
Section: Figure 3: Architecture and Process Flowmentioning
confidence: 99%