Early text classification: a Naïve solution

Gómez, Manuel Montes y; Villaseñor, L.; Errecalde, Marcelo Luis

doi:10.18653/v1/w16-0416

Cited by 11 publications

(16 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The window size chosen was w = 3, that is, three terms were read between each run of the early text classification framework. Based on Escalante's work [3] we chose a naïve Bayes classifier for the CPI model. The performance for the partial documents can be seen in Fig 2. Clearly, we can classify documents without reading all terms.…”

Section: Experiments and Resultsmentioning

confidence: 99%

“…In [3] the authors propose an adaptation of Naïve Bayes to tackle the problem of classification with partial information. Although they achieve similar performance to state of the art models that read the entire document, they do not approach the DMC problem.…”

Section: Related Workmentioning

confidence: 99%

“…To date, only a few papers have approached this kind of scenarios [2,3,5]. Despite its low popularity, this topic has a major potential in practical applications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning When to Classify for Early Text Classification

Loyola

Errecalde

Gómez

2018

Communications in Computer and Information Science

Self Cite

View full text Add to dashboard Cite

Abstract. The problem of classification in supervised learning is a widely studied one. Nonetheless, there are scenarios that received little attention despite its applicability. One of such scenarios is early text classification, where one needs to know the category of a document as soon as possible. The importance of this variant of the classification problem is evident in tasks like sexual predator detection, where one wants to identify an offender as early as possible. This paper presents a framework for early text classification which highlights the two main pieces involved in this problem: classification with partial information and deciding the moment of classification. In this context, a novel approach that learns the second component (when classify) and an adaptation of a temporal measurement for multi-class problems are introduced. Results with a classical text classification corpus in comparison against a model that reads the entire documents confirm the feasibility of our approach.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning When to Classify for Early Text Classification

Loyola

Errecalde

Gómez

2018

Communications in Computer and Information Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…For instance, some works have addressed early text classification by using diverse techniques like modifications of Naive Bayes (Escalante et al, 2016), profile-based representations (Escalante et al, 2017), and Multi-Resolution Concept Representations (López-Monroy et al, 2018). Those approaches have focused on quantifying prediction performance of the classifiers when using partial information in documents, that is, by considering how well they behave when incremental percentages of documents are provided to the classifier.…”

Section: Analysis Of Sequential Data: Early Classificationmentioning

confidence: 99%

A text classification framework for simple and effective early depression detection over social media streams

Burdisso

Errecalde

Montes-y-Gómez

2019

Expert Systems with Applications

137

View full text Add to dashboard Cite

With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.

show abstract

“…The key aspect of the work is a Markov Decision Process (MDP), where each sentence is modeled in a TFIDF vector. More recently, (Escalante et al, 2016) proposed a straightforward solution for early detection scenarios by using the naïve Bayes classifier. The idea consists in training with full documents, but when partial information has to be classified, the maximum a posteriori probability was estimated over the available text.…”

Section: Related Workmentioning

confidence: 99%

Early Text Classification Using Multi-Resolution Concept Representations

Monroy¹,

González

Montes

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

Self Cite

View full text Add to dashboard Cite

This paper proposes a novel document representation, called Multi-Resolution Representation (MulR), to improve the early detection of risks in social media sources. The goal is to effectively identify the potential risk using as little evidence as possible and with as much anticipation as possible. MulR allows us to generate multiple "views" of the text. These views capture different semantic meanings for words and documents at different levels of granularity, which is very useful in early scenarios to model the variable amounts of evidence. The experimental evaluation shows that MulR using low resolution is better suited for modeling short documents (very early stages), whereas large documents (medium/late stages) are better modeled with higher resolutions. We evaluate the proposed ideas in two different tasks where anticipation is critical: sexual predator detection and depression detection. The experimental evaluation for these early tasks revealed that the proposed approach outperforms previous methodologies by a considerable margin.

show abstract

Early text classification: a Naïve solution

Cited by 11 publications

References 13 publications

Learning When to Classify for Early Text Classification

Learning When to Classify for Early Text Classification

A text classification framework for simple and effective early depression detection over social media streams

Early Text Classification Using Multi-Resolution Concept Representations

Contact Info

Product

Resources

About