Dittaya Wanvarie scite author profile

Dittaya Wanvarie

5Publications

65Citation Statements Received

92Citation Statements Given

How they've been cited

How they cite others

171

Affiliations

Chulalongkorn University, Tokyo Institute of Technology, Computational Intelligence and Information Systems Lab

Publications

Order By: Most citations

Learning Only from Relevant Keywords and Unlabeled Documents

Charoenphakdee

Lee²,

Jin

et al. 2019

View full text Add to dashboard Cite

We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate welldeveloped techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks. We demonstrate how to optimize the area under the receiver operating characteristic curve (AUC) effectively and also discuss how to adjust it to optimize other well-known evaluation metrics such as the accuracy and F 1measure. Finally, we show the effectiveness of our framework using benchmark datasets.

show abstract

Learning from noisy out-of-domain corpus using dataless classification

Jin

Wanvarie

Le³

2020

Nat. Lang. Eng.

View full text Add to dashboard Cite

In real-world applications, text classification models often suffer from a lack of accurately labelled documents. The available labelled documents may also be out of domain, making the trained model not able to perform well in the target domain. In this work, we mitigate the data problem of text classification using a two-stage approach. First, we mine representative keywords from a noisy out-of-domain data set using statistical methods. We then apply a dataless classification method to learn from the automatically selected keywords and unlabelled in-domain data. The proposed approach outperformed various supervised learning and dataless classification baselines by a large margin. We evaluated different keyword selection methods intrinsically and extrinsically by measuring their impact on the dataless classification accuracy. Last but not least, we conducted an in-depth analysis of the behaviour of the classifier and explained why the proposed dataless classification method outperformed supervised learning counterparts.

show abstract

Active Learning with Subsequence Sampling Strategy for Sequence Labeling Tasks

Wanvarie

Takamura

Okumura

2011

Journal of Natural Language Processing

View full text Add to dashboard Cite

We propose an active learning framework for sequence labeling tasks. In each iteration, a set of subsequences are selected and manually labeled, while the other parts of sequences are left unannotated. The learning will stop automatically when the training data between consecutive iterations does not significantly change. We evaluate the proposed framework on chunking and named entity recognition data provided by CoNLL. Experimental results show that we succeed in obtaining the supervised F 1 only with 6.98%, and 7.01% of tokens being annotated, respectively.

show abstract

Towards improving coherence and diversity of slogan generation

Jin

Bhatia²,

Wanvarie

et al. 2022

Nat. Lang. Eng.

View full text Add to dashboard Cite

Previouswork in slogan generation focused on utilising slogan skeletons mined from existing slogans. While some generated slogans can be catchy, they are often not coherent with the company’s focus or style across their marketing communications because the skeletons are mined from other companies’ slogans. We propose a sequence-to-sequence (seq2seq) Transformer model to generate slogans from a brief company description. A naïve seq2seq model fine-tuned for slogan generation is prone to introducing false information. We use company name delexicalisation and entity masking to alleviate this problem and improve the generated slogans’ quality and truthfulness. Furthermore, we apply conditional training based on the first words’ part-of-speech tag to generate syntactically diverse slogans. Our best model achieved a ROUGE-1/-2/-L $\mathrm{F}_1$ score of 35.58/18.47/33.32. Besides, automatic and human evaluations indicate that our method generates significantly more factual, diverse and catchy slogans than strong long short-term memory and Transformer seq2seq baselines.

show abstract

Breakthrough infections, hospital admissions, and mortality after major COVID-19 vaccination profiles: a prospective cohort study

Wichaidit¹,

Nopsopon²,

Sunan³

et al. 2023

The Lancet Regional Health - Southeast Asia

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.