Kiet Van Nguyen scite author profile

Over 97 million people speak Vietnamese as their native language in the world. However, there are few research studies on machine reading comprehension (MRC) for Vietnamese, the task of understanding a text and answering questions related to it. Due to the lack of benchmark datasets for Vietnamese, we present the Vietnamese Question Answering Dataset (UIT-ViQuAD), a new dataset for the low-resource language as Vietnamese to evaluate MRC models. This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia. In particular, we propose a new process of dataset creation for Vietnamese MRC. Our in-depth analyses illustrate that our dataset requires abilities beyond simple reasoning like word matching and demands single-sentence and multiple-sentence inferences. Besides, we conduct experiments on state-of-the-art MRC methods for English and Chinese as the first experimental models on UIT-ViQuAD. We also estimate human performance on the dataset and compare it to the experimental results of powerful machine learning models. As a result, the substantial differences between human performance and the best model performance on the dataset indicate that improvements can be made on UIT-ViQuAD in future research. Our dataset is freely available on our website 1 to encourage the research community to overcome challenges in Vietnamese MRC.

show abstract

A Large-Scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts

Luu

Nguyen²,

Nguyen³

2021

View full text Add to dashboard Cite

UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis

Nguyen

et al. 2018

View full text Add to dashboard Cite

Variants of Long Short-Term Memory for Sentiment Analysis on Vietnamese Students’ Feedback Corpus

Nguyen

2018

View full text Add to dashboard Cite

VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension

Nguyen¹,

Tran²,

Nguyen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

Luu¹,

Nguyen²,

Nguyen³

et al. 2020

View full text Add to dashboard Cite

Enhancing Lexical-Based Approach With External Knowledge for Vietnamese Multiple-Choice Machine Reading Comprehension

et al. 2020

View full text Add to dashboard Cite

Although Vietnamese is the 17 th most popular native-speaker language a in the world, there are not many research studies on Vietnamese machine reading comprehension (MRC), the task of understanding a text and answering questions about it. One of the reasons is because of the lack of high-quality benchmark datasets for this task. In this work, we construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts which are commonly used for teaching reading comprehension for elementary school pupils. In addition, we propose a lexicalbased MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text. We compare the performance of the proposed model with several baseline lexical-based and neural network-based models. Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model. We also measure human performance on our dataset and find that there is a big gap between machine-model and human performances. This indicates that significant progress can be made on this task. The dataset is freely available on our website b for research purposes.

show abstract

Deep Learning versus Traditional Classifiers on Vietnamese Students’ Feedback Corpus

Nguyen

Hong

Nguyen

et al. 2018

View full text Add to dashboard Cite

Student's feedback is an important source of collecting students' opinions to improve quality of training activities. Implementing sentiment analysis into student feedback data, we can determine sentiments polarities which express all problems in the institution since changes necessary will be applied to improve the quality of teaching and learning. This study focused on the machine learning and natural language processing techniques (Naive Bayes, Maximum Entropy, Long Short-Term Memory, Bi-Directional Long Short-Term Memory) on the Vietnamese Students' Feedback Corpus collected from a university. The final results were compared and evaluated to find the most effective model based on different evaluation criteria. The experimental results show that Bi-Directional Long Short-Term Memory algorithm outperformed than three other algorithms in term of the F1-score measurement with 92.0% on the sentiment classification task and 89.6% on the topic classification task. In addition, we developed a sentiment analysis application analyzing student feedback. The application will help the institution to recognize students' opinions about a problem and identify shortcomings that still exist. With the use of this application, the institution can propose an appropriate method to improve the quality of training activities in the future.• Bi-Directional Long Short-Term Memory (Bi-LSTM)

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kiet Van Nguyen

A Vietnamese Dataset for Evaluating Machine Reading Comprehension

A Large-Scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts

UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis

Variants of Long Short-Term Memory for Sentiment Analysis on Vietnamese Students’ Feedback Corpus

VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

Enhancing Lexical-Based Approach With External Knowledge for Vietnamese Multiple-Choice Machine Reading Comprehension

Deep Learning versus Traditional Classifiers on Vietnamese Students’ Feedback Corpus

Contact Info

Product

Resources

About