Phu T. V. Le scite author profile

Phu T. V. Le

4Publications

26Citation Statements Received

69Citation Statements Given

How they've been cited

How they cite others

113

Affiliations

Publications

Order By: Most citations

Learning from noisy out-of-domain corpus using dataless classification

Jin

Wanvarie

Le³

2020

Nat. Lang. Eng.

View full text Add to dashboard Cite

In real-world applications, text classification models often suffer from a lack of accurately labelled documents. The available labelled documents may also be out of domain, making the trained model not able to perform well in the target domain. In this work, we mitigate the data problem of text classification using a two-stage approach. First, we mine representative keywords from a noisy out-of-domain data set using statistical methods. We then apply a dataless classification method to learn from the automatically selected keywords and unlabelled in-domain data. The proposed approach outperformed various supervised learning and dataless classification baselines by a large margin. We evaluated different keyword selection methods intrinsically and extrinsically by measuring their impact on the dataless classification accuracy. Last but not least, we conducted an in-depth analysis of the behaviour of the classifier and explained why the proposed dataless classification method outperformed supervised learning counterparts.

show abstract

Towards improving coherence and diversity of slogan generation

Jin

Bhatia²,

Wanvarie

et al. 2022

Nat. Lang. Eng.

View full text Add to dashboard Cite

Previouswork in slogan generation focused on utilising slogan skeletons mined from existing slogans. While some generated slogans can be catchy, they are often not coherent with the company’s focus or style across their marketing communications because the skeletons are mined from other companies’ slogans. We propose a sequence-to-sequence (seq2seq) Transformer model to generate slogans from a brief company description. A naïve seq2seq model fine-tuned for slogan generation is prone to introducing false information. We use company name delexicalisation and entity masking to alleviate this problem and improve the generated slogans’ quality and truthfulness. Furthermore, we apply conditional training based on the first words’ part-of-speech tag to generate syntactically diverse slogans. Our best model achieved a ROUGE-1/-2/-L $\mathrm{F}_1$ score of 35.58/18.47/33.32. Besides, automatic and human evaluations indicate that our method generates significantly more factual, diverse and catchy slogans than strong long short-term memory and Transformer seq2seq baselines.

show abstract

Selecting Domain-Specific Concepts for Question Generation With Lightly-Supervised Methods

Jin

Le²

2016

View full text Add to dashboard Cite

In this paper we propose content selection methods for question generation (QG) which exploit domain knowledge. Traditionally, QG systems apply syntactical transformation on individual sentences to generate open domain questions. We hypothesize that a QG system informed by domain knowledge can ask more important questions. To this end, we propose two lightly-supervised methods to select salient target concepts for QG based on domain knowledge collected from a corpus. One method selects important semantic roles with bootstrapping and the other selects important semantic relations with Open Information Extraction (OpenIE). We demonstrate the effectiveness of the two proposed methods on heterogeneous corpora in the business domain. This work exploits domain knowledge in QG task and provides a promising paradigm to generate domain-specific questions.

show abstract

Bridging the Gap Between Research and Production with CODE

Jin

Wanvarie

Le³

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Phu T. V. Le

Learning from noisy out-of-domain corpus using dataless classification

Towards improving coherence and diversity of slogan generation

Selecting Domain-Specific Concepts for Question Generation With Lightly-Supervised Methods

Bridging the Gap Between Research and Production with CODE

Contact Info

Product

Resources

About