DOC: Deep Open Classification of Text Documents

Shu, Lei; Xu, Hongwu; Liu, Bing

doi:10.18653/v1/d17-1314

Cited by 242 publications

(227 citation statements)

References 25 publications

Supporting

Mentioning

227

Contrasting

Order By: Relevance

“…Note that small perturbations of each input are added to the last feature layer in this baseline. 4) DOC [28]: m binary classifiers are built for m classes.…”

Section: Baselinesmentioning

confidence: 99%

Out-of-Domain Detection for Natural Language Understanding in Dialog Systems

Zheng

Chen

Huang

2020

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Natural Language Understanding (NLU) is a vital component of dialogue systems, and its ability to detect Out-of-Domain (OOD) inputs is critical in practical applications, since the acceptance of the OOD input that is unsupported by the current system may lead to catastrophic failure. However, most existing OOD detection methods rely heavily on manually labeled OOD samples and cannot take full advantage of unlabeled data. This limits the feasibility of these models in practical applications.In this paper, we propose a novel model to generate highquality pseudo OOD samples that are akin to IN-Domain (IND) input utterances, and thereby improves the performance of OOD detection. To this end, an autoencoder is trained to map an input utterance into a latent code. and the codes of IND and OOD samples are trained to be indistinguishable by utilizing a generative adversarial network. To provide more supervision signals, an auxiliary classifier is introduced to regularize the generated OOD samples to have indistinguishable intent labels. Experiments show that these pseudo OOD samples generated by our model can be used to effectively improve OOD detection in NLU. Besides, we also demonstrate that the effectiveness of these pseudo OOD data can be further improved by efficiently utilizing unlabeled data.

show abstract

“…Note that small perturbations of each input are added to the last feature layer in this baseline. 4) DOC [28]: m binary classifiers are built for m classes.…”

Section: Baselinesmentioning

confidence: 99%

Out-of-Domain Detection for Natural Language Understanding in Dialog Systems

Zheng

Chen

Huang

2020

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Text classification tasks in real-world applications often consists of 2 components-In-Doman (ID) classification and Out-of-Domain (OOD) detection components Kim and Kim, 2018;Shu et al, 2017;Shamekhi et al, 2018). ID classification refers to classifying a user's input with a label that exists in the training data, and OOD detection refers to designate a special OOD tag to the input when it does not belong to any of the labels in the ID training dataset (Dai et al, 2007).…”

Section: Introductionmentioning

confidence: 99%

Out-of-Domain Detection for Low-Resource Text Classification Tasks

Tan

Wang

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Out-of-domain (OOD) detection for lowresource text classification is a realistic but understudied task. The goal is to detect the OOD cases with limited in-domain (ID) training data, since we observe that training data is often insufficient in machine learning applications. In this work, we propose an OODresistant Prototypical Network to tackle this zero-shot OOD detection and few-shot ID classification task. Evaluation on real-world datasets show that the proposed solution outperforms state-of-the-art methods in zero-shot OOD detection task, while maintaining a competitive performance on ID classification task.

show abstract

“…We used the one-vs-rest logistic regression instead of the multinomial logistic regression in order to obtain a probability cutoff of 0.5 to determine the unknown cell type. DOC was an advanced machine learning method for classifying unseen text documents, which was inherently similar to our problem and could be directly applied here 38 . The key idea of DOC was to find a data-driven probability cutoff for the unknown class rather than using a fixed probability cutoff of 0.5 as LR did.…”

Section: Comparison Approachesmentioning

confidence: 99%

Unifying single-cell annotations based on the Cell Ontology

Sheng

Pisco

Karkanias

et al. 2019

Preprint

View full text Add to dashboard Cite

Single cell technologies have rapidly generated an unprecedented amount of data that enables us to understand biological systems at single-cell resolution. However, analyzing datasets generated by independent labs remains challenging due to a lack of consistent terminology to describe cell types. Here, we present OnClass, an algorithm and accompanying software for automatically classifying cells into cell types represented by a controlled vocabulary derived from the Cell Ontology. Cell type similarity is inferred according to the distances in the Cell Ontology so a key advantage of OnClass is its ability to annotate cell types that are not present in the training set by using the hierarchical structure of the vocabulary space. We applied OnClass to diverse collections of single cell transcriptomics of both mouse and human and observed substantial improvement on automated cell type annotation. We further demonstrated how OnClass can be used to identify marker genes for cell types present and absent in the training set, suggesting that OnClass can be used as a tool to associate marker genes to each term of the Cell Ontology, offering the possibility of refining the Cell Ontology using a data-centric approach.

show abstract

DOC: Deep Open Classification of Text Documents

Abstract: Abstract

Cited by 242 publications

References 25 publications

Out-of-Domain Detection for Natural Language Understanding in Dialog Systems

Out-of-Domain Detection for Natural Language Understanding in Dialog Systems

Out-of-Domain Detection for Low-Resource Text Classification Tasks

Unifying single-cell annotations based on the Cell Ontology

Contact Info

Product

Resources

About