2014
DOI: 10.9781/ijimai.2014.254
|View full text |Cite
|
Sign up to set email alerts
|

Graph-based Techniques for Topic Classification of Tweets in Spanish

Abstract: Topic classification of texts is one of the most interesting challenges in Natural Language Processing (NLP). Topic classifiers commonly use a bag-of-words approach, in which the classifier uses (and is trained with) selected terms from the input texts. In this work we present techniques based on graph similarity to classify short texts by topic. In our classifier we build graphs from the input texts, and then use properties of these graphs to classify them. We have tested the resulting algorithm by classifyin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 9 publications
0
12
0
Order By: Relevance
“…A problem that has been widely studied is how to find the characteristic words of a document (e.g., [44][45][46][47]). ese characteristic words can be used, for instance, to implement keywordbased document search.…”
Section: Characteristic Words Of a Documentmentioning
confidence: 99%
“…A problem that has been widely studied is how to find the characteristic words of a document (e.g., [44][45][46][47]). ese characteristic words can be used, for instance, to implement keywordbased document search.…”
Section: Characteristic Words Of a Documentmentioning
confidence: 99%
“…As mentioned above, the base resources used for this task are Hunspell, Twokenize, and NetLingo. These resources were selected because they have been widely and successfully used to pre-process tweets in other research works, such as [26], [27], [33], [34], [35], [36], [37].…”
Section: Text Pre-processingmentioning
confidence: 99%
“…Many different models have been proposed over the years. The most prominent of these are (HMM), Maximum Entropy Markov Models (MEMM), SVM or even a vector classification model for which the features are not terms, but graph metrics [47] and CRF. Other studies used unsupervised machine learning methods; a class of problems in which one seeks to determine how the data are organized such as clustering; a common technique for statistical data analysis used in many fields as used in [48].…”
Section: Information Extraction and Methodsmentioning
confidence: 99%