Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation

Jin, Yiping; Bhatia, Akshay; Wanvarie, Dittaya

doi:10.18653/v1/2021.naacl-srw.14

Cited by 6 publications

(1 citation statement)

References 15 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The method requires a handful of labeled keywords instead of a large corpus of labeled documents and can be easily transferred to new domains. We further evaluate the weakly-supervised models using unsupervised error estimation and perform automatic keyword selection (Jin et al, 2021a). Unsupervised error estimation is essential because no labeled development dataset is available in real-world problems where weakly-supervised text classification methods are applied.Finally, we tap on a state-of-the-art sequence-to-sequence Transformer model to generate cohesive and diverse advertising slogans from a short company description (Jin et al, In press).…”

mentioning

confidence: 99%

Natural language processing for digital advertising

Jin

View full text Add to dashboard Cite

Advertising is not only a marketing or sales activity but a particular form of two-way communication.�In this thesis,�we propose to apply the two main subtasks of natural language processing (NLP), namely natural language understanding (NLU) and natural language generation (NLG), to digital advertising to enhance the effectiveness of advertising. We apply weakly-supervised text classification to rapidly build text classifiers for contextual advertising (Jin et al. 2022). The method requires a handful of labeled keywords instead of a large corpus of labeled documents and can be easily transferred to new domains. We further evaluate the weakly-supervised models using unsupervised error estimation and perform automatic keyword selection (Jin et al., 2021a). Unsupervised error estimation is essential because no labeled development dataset is available in real-world problems where weakly-supervised text classification methods are applied. Finally, we tap on a state-of-the-art sequence-to-sequence Transformer model to generate cohesive and diverse advertising slogans from a short company description (Jin et al., In press). We prevent the model from hallucinating unsupported information using entity masking and generate diverse and catchy slogans using conditional training.

show abstract

mentioning

confidence: 99%