Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/549
|View full text |Cite
|
Sign up to set email alerts
|

Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings

Abstract: Dataless text classification has attracted increasing attentions recently. It only needs very few seed words of each category to classify documents, which is much cheaper than supervised text classification that requires massive labeling efforts. However, most of existing models pay attention to long texts, but get unsatisfactory performance on short texts, which have become increasingly popular on the Internet. In this paper, we at first propose a novel model named Seeded Biterm Topic Model (SeedBTM) … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 1 publication
0
5
0
Order By: Relevance
“…At an early stage, some researchers used auxiliary knowledge bases like Wikipedia to establish the semantic correlation between texts and labels [3,29]. Subsequently, topic-model based methods emerged [4,13,14,33,34], which inferred category-aware topics from a limited set of seed words. In the last few years, neural methods has gained prominance [22,23,31,36,39].…”
Section: Related Work 21 Weakly Supervised Text Classificationmentioning
confidence: 99%
“…At an early stage, some researchers used auxiliary knowledge bases like Wikipedia to establish the semantic correlation between texts and labels [3,29]. Subsequently, topic-model based methods emerged [4,13,14,33,34], which inferred category-aware topics from a limited set of seed words. In the last few years, neural methods has gained prominance [22,23,31,36,39].…”
Section: Related Work 21 Weakly Supervised Text Classificationmentioning
confidence: 99%
“…First, we associate each topic z with an individual attribute value c q , and initialize states for the Markov chain randomly like BTM. Next, inspired by (Yang et al 2020), we define the conditional distribution P (c q |c Β¬b i , h ,l , B, Ξ±, Ξ²) for each biterm b i,h,l in the biterm set B via combining the biterm-attribute value similarity score Ω(b i,h,l , c q ) with the conditional distribution P(c q |c Β¬b i , h ,l , B, Ξ±, Ξ²) (Formula 1) as follows:…”
Section: Attribute Knowledge Integration (Aki) Modulementioning
confidence: 99%
“…First, we associate each topic 𝑧 with an individual attribute value 𝑐 π‘ž , and initialize states for the Markov chain randomly like BTM. Next, inspired by (Yang et al 2020), we define the conditional distribution 𝑃 (𝑐 π‘ž |c ¬𝑏 𝑖,β„Ž,𝑙 , 𝔅, 𝛼, 𝛽) for each biterm 𝑏 𝑖,β„Ž,𝑙 in the biterm set 𝔅 via combining the biterm-attribute value similarity score Ξ©(𝑏 𝑖,β„Ž,𝑙 , 𝑐 π‘ž ) with the conditional distribution 𝑃(𝑐 π‘ž |c ¬𝑏 𝑖,β„Ž,𝑙 , 𝔅, 𝛼, 𝛽) (Formula 1) as follows:…”
Section: Attribute Knowledge Integration (Aki) Modulementioning
confidence: 99%
“…Specifically, ConWea (Mekala and Shang 2020) can utilize userprovided seed words to create a contextualized utterance corpus, which is further leveraged to train an utterance classifier and expand seed words iteratively. SeedBTM (Yang et al 2020) could utilize user-provided seed words to extend BTM into an utterance classifier based on the word embedding technique. LOTClass (Meng et al 2020) generates some attribute-indicative words for each attribute value to fine-tune a PLM on a word-level category prediction task, and then does self-training on unlabeled utterances.…”
Section: Effectiveness Studymentioning
confidence: 99%