Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1411
|View full text |Cite
|
Sign up to set email alerts
|

Learning Only from Relevant Keywords and Unlabeled Documents

Abstract: We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate welldeveloped techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(23 citation statements)
references
References 39 publications
0
21
0
Order By: Relevance
“…In this section, we demonstrate how to apply the robustness result of symmetric losses to tackle a weaklysupervised natural language processing task, namely learning only from relevant keywords and unlabeled documents [Charoenphakdee et al, 2019a].…”
Section: A Symmetric Loss Approach To Learning Only From Relevant Abs...mentioning
confidence: 99%
See 3 more Smart Citations
“…In this section, we demonstrate how to apply the robustness result of symmetric losses to tackle a weaklysupervised natural language processing task, namely learning only from relevant keywords and unlabeled documents [Charoenphakdee et al, 2019a].…”
Section: A Symmetric Loss Approach To Learning Only From Relevant Abs...mentioning
confidence: 99%
“…The bottleneck of the method proposed by Jin et al [2017] is lack of flexibility of model choices and optimization algorithms. This makes it difficult to bring many Figure 2: An overview of the framework for learning only from relevant keywords and unlabeled document [Charoenphakdee et al, 2019a]. Blue documents indicate positive documents and red documents denote negative documents in the two sets of documents divided by a pseudo-labeling algorithm.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…GE has been successfully applied on different tasks, such as text categorisation (Druck et al 2008) and language identification in mixed-language documents (King and Abney 2013). Similarly, Charoenphakdee et al (2019) proposed a theoretically grounded risk minimisation framework that directly optimises the area under the receiver operating characteristic curve (area under the curve) of a dataless classification model. Settles (2011) and Li and Yang (2018) both used multinomial naïve Bayes (MNB) for dataless classification.…”
Section: Dataless Classificationmentioning
confidence: 99%