Abstract:Abstract. The problem of classification in supervised learning is a widely studied one. Nonetheless, there are scenarios that received little attention despite its applicability. One of such scenarios is early text classification, where one needs to know the category of a document as soon as possible. The importance of this variant of the classification problem is evident in tasks like sexual predator detection, where one wants to identify an offender as early as possible. This paper presents a framework for e… Show more
“…One reason for using the UCR datasets is simply that everyone does this, and that it helps making comparisons between the various proposed approaches. Another reason is that, in fact, applications where ECTS is useful exist [30,2,24,10,28,14,8] but they have still to realize their full potential. Companies are becoming more and more aware of the problem and are now starting to integrate ECTS into their projects.…”
Section: 2mentioning
confidence: 99%
“…That the research carried out so far has not lead to applications is to our opinion a mistaken critic [30,2,24,10,28,14,8]. Real applications are: i) either hidden from the academic world because of their sensitive nature; ii) or under development, as this is a relatively new problem whose awareness is recent.…”
Section: Sum Upmentioning
confidence: 99%
“…An increasing number of applications require to recognize the class of an incoming time series as quickly as possible without unduly compromising the accuracy of the prediction [30,2,24,10,28,14,8]. For example, in emergency wards of hospitals [17], in control rooms of national or international electrical power grids, in government councils assessing emergency situations, in all kinds of contexts, it is essential to make timely decisions in absence of complete knowledge of the true outcome (e.g.…”
Many approaches have been proposed for early classification of time series in light of its significance in a wide range of applications including healthcare, transportation and finance. However, recently a preprint saved on Arxiv claims that all research done for almost 20 years now on the Early Classification of Time Series is useless, or, at the very least, ill-oriented because severely lacking a strong ground. In this paper, we answer in detail the main issues and misunderstandings raised, and propose directions to further expand the fields of application of early classification of time series.
“…One reason for using the UCR datasets is simply that everyone does this, and that it helps making comparisons between the various proposed approaches. Another reason is that, in fact, applications where ECTS is useful exist [30,2,24,10,28,14,8] but they have still to realize their full potential. Companies are becoming more and more aware of the problem and are now starting to integrate ECTS into their projects.…”
Section: 2mentioning
confidence: 99%
“…That the research carried out so far has not lead to applications is to our opinion a mistaken critic [30,2,24,10,28,14,8]. Real applications are: i) either hidden from the academic world because of their sensitive nature; ii) or under development, as this is a relatively new problem whose awareness is recent.…”
Section: Sum Upmentioning
confidence: 99%
“…An increasing number of applications require to recognize the class of an incoming time series as quickly as possible without unduly compromising the accuracy of the prediction [30,2,24,10,28,14,8]. For example, in emergency wards of hospitals [17], in control rooms of national or international electrical power grids, in government councils assessing emergency situations, in all kinds of contexts, it is essential to make timely decisions in absence of complete knowledge of the true outcome (e.g.…”
Many approaches have been proposed for early classification of time series in light of its significance in a wide range of applications including healthcare, transportation and finance. However, recently a preprint saved on Arxiv claims that all research done for almost 20 years now on the Early Classification of Time Series is useless, or, at the very least, ill-oriented because severely lacking a strong ground. In this paper, we answer in detail the main issues and misunderstandings raised, and propose directions to further expand the fields of application of early classification of time series.
“…TEASER (Schäfer and Leser, 2020) or ECTS (Xing et al, 2012). However, there exists a key difference that prevents us from using such methods directly: An eSPD System never classifies a chat as nongrooming as long as there are still messages left (or expected), while an eTSC system at some stage might decide that it is safe to stop controlling the chat (Loyola et al, 2018). This opens the door to malicious attacks by using long and harmless openings in grooming attempts.…”
An important risk that children face today is online grooming, where a so-called sexual predator establishes an emotional connection with a minor online with the objective of sexual abuse. Prior work has sought to automatically identify grooming chats, but only after an incidence has already happened in the context of legal prosecution. In this work, we instead investigate this problem from the point of view of prevention. We define and study the task of early sexual predator detection (eSPD) in chats, where the goal is to analyze a running chat from its beginning and predict grooming attempts as early and as accurately as possible. We survey existing datasets and their limitations regarding eSPD, and create a new dataset called PANC for more realistic evaluations. We present strong baselines built on BERT that also reach state-of-the-art results for conventional SPD. Finally, we consider coping with limited computational resources, as real-life applications require eSPD on mobile devices.
“…Finally, (Loyola et al, 2018) considers the decision of "when to classify" as a problem to be learned on its own and trains two SVMs, one to make category predictions and the other to decide when to stop reading the stream. Nonetheless, the use of these two SVMs, again, hides the reasons behind both, the classification and the decision to stop early.…”
Section: Analysis Of Sequential Data: Early Classificationmentioning
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.