Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2016
DOI: 10.18653/v1/n16-1061
|View full text |Cite
|
Sign up to set email alerts
|

Breaking the Closed World Assumption in Text Classification

Abstract: Existing research on multiclass text classification mostly makes the closed world assumption, which focuses on designing accurate classifiers under the assumption that all test classes are known at training time. A more realistic scenario is to expect unseen classes during testing (open world). In this case, the goal is to design a learning system that classifies documents of the known classes into their respective classes and also to reject documents from unknown classes. This problem is called open (world) c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
66
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(80 citation statements)
references
References 23 publications
(31 reference statements)
0
66
0
Order By: Relevance
“…The performance of the resulting model (i.e., ER+POG) on the OSQ and IPA dataset is depicted in Table II and Table III, respectively. 4 Note that FPR= FP FP+TN , and TPR= TP TP+FN , where FP, TP, TN, FN is the number of false positives, true positives, true negatives, false negatives, respectively. It can be seen that our model outperforms all the baselines on both datasets significantly, which demonstrates the effectiveness of the generated pseudo OOD utterances.…”
Section: E Effects Of Generated Pseudo Ood Utterancesmentioning
confidence: 99%
See 1 more Smart Citation
“…The performance of the resulting model (i.e., ER+POG) on the OSQ and IPA dataset is depicted in Table II and Table III, respectively. 4 Note that FPR= FP FP+TN , and TPR= TP TP+FN , where FP, TP, TN, FN is the number of false positives, true positives, true negatives, false negatives, respectively. It can be seen that our model outperforms all the baselines on both datasets significantly, which demonstrates the effectiveness of the generated pseudo OOD utterances.…”
Section: E Effects Of Generated Pseudo Ood Utterancesmentioning
confidence: 99%
“…Recently, various deep neural network based NLU models are proposed and some of these models have been applied in real-world applications [1]- [3]. Most existing neural NLU modules are built by following a closed-world assumption [4], [5], i.e, the data used in the training and testing phrase are drawn from the same distribution. However, such an assumption is commonly violated in practical systems that are deployed in a dynamic or open environment.…”
Section: Introductionmentioning
confidence: 99%
“…The main idea is that a classifier should not cover too much open space with few or no training data, thereby rejecting the unknown images. cbsSVM [10] shares the similar ideas in text classification. However, these methods are all based on SVM, which fails to effectively capture the high-level semantic concept of intents comparing with deep neural networks.…”
Section: Related Workmentioning
confidence: 94%
“…How do we detect unknown intent without any prior knowledge about it? In [9,10], a m-class classifier should be able to reject examples from unknown class while performing m-class classification tasks. It is because not all test classes are known in the training set, which forms a (m+1)-class classification problem where the (m+1) th class represents the unknown class.…”
Section: Introductionmentioning
confidence: 99%
“…Although most of research papers on text classification deal with the problem of closed‐set classification, some methods have been recently proposed to tackle open‐set text classification. Fei and Liu proposed a center‐based similarity method, which is based on a decision threshold (usually 0.5) on a posteriori probability to reject an observation as unrecognized, where the probabilities are estimated from SVM scores using Platt's algorithm . Doan and Kalita proposed an algorithm called nearest class mean, which attempts to find boundary regions for known classes using spheres centered at class centroids, with observations falling outside the sphere boundaries treated as either outliers or indicators of possible new unknown classes.…”
Section: Introduction – Problem Formulationmentioning
confidence: 99%