A Vietnamese Dataset for Evaluating Machine Reading Comprehension

Nguyen, Kiet Van; Nguyen, Vu; Nguyen, Anh H. T.; Nguyen, Ngan Luu-Thuy

doi:10.18653/v1/2020.coling-main.233

Cited by 136 publications

(174 citation statements)

References 22 publications

Supporting

Mentioning

170

Contrasting

Unclassified

Order By: Relevance

“…For the machine reading comprehension model, the Document Reader (DrQA) introduced by Chen et al [1] is a powerful model on various of machine reading comprehension corpora such as: SQuAD [11], TextWorldsQA [8], and UIT-ViSQuAD [10]. The DrQA model consists of two modules: Document Retriever and Document Reader.…”

Section: Methodologiesmentioning

confidence: 99%

“…Many of MRC corpora are constructed on specific domains and open domains in English such as SQuAD [11] (extractive MRC) on Wikipedia articles, RACE [4] (multiple choices MRC) on High school students English Exams domain, and NarrativeQA [7] (abstractive MRC) on books and stories domain. For the Vietnamese language, the UIT-ViQuAD [10] (Wikipedia domain) and ViNewQA [15] (Health news domain) are two extractive MRC corpora for machine reading comprehension. Besides, the ViMMRC [9] is the multiple-choice reading comprehension corpus on the Vietnamese students' textbook for primary schools domain.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

Luu,

Bui,

Nguyen

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine reading comprehension (MRC) is a sub-field in natural language processing that aims to help computers understand unstructured texts and then answer questions related to them. In practice, conversation is an essential way to communicate and transfer information. To help machines understand conversation texts, we present UIT-ViCoQA -a new corpus for conversational machine reading comprehension in the Vietnamese language. This corpus consists of 10,000 questions with answers to over 2,000 conversations about health news articles. Then, we evaluate several baseline approaches for conversational machine comprehension on the UIT-ViCoQA corpus. The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement.

show abstract

Section: Methodologiesmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

Luu,

Bui,

Nguyen

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Three more languages have their versions of SQuAD [210]: French [66,126], Vietnamese [187], and Korean [150],…”

Section: Monolingual Resourcesmentioning

confidence: 99%

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Rogers¹,

Gardner²,

Augenstein³

2021

Preprint

View full text Add to dashboard Cite

Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "reasoning types" in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English,

show abstract

“…The output of Pyserini is then reranked by a T5 language model, 10 which is fine-tuned on MS MARCO, a large machine reading comprehension dataset. 18 Similarly, SLEDGE 19 uses a similar approach, but using SciBERT 13 to rerank documents.…”

Section: Background and Significancementioning

confidence: 99%

A term-based and citation network-based search system for COVID-19

et al. 2021

View full text Add to dashboard Cite

The COVID-19 pandemic resulted in an unprecedented production of scientific literature spanning several fields. To facilitate navigation of the scientific literature related to various aspects of the pandemic, we developed an exploratory search system. The system is based on automatically identified technical terms, document citations, and their visualization, accelerating identification of relevant documents. It offers a multi-view interactive search and navigation interface, bringing together unsupervised approaches of term extraction and citation analysis. We conducted a user evaluation with domain experts, including epidemiologists, biochemists, medicinal chemists, and medicine students. In general, most users were satisfied with the relevance and speed of the search results. More interestingly, participants mostly agreed on the capacity of the system to enable exploration and discovery of the search space using the graph visualization and filters. The system is updated on a weekly basis and it is publicly available at http://www.nactem.ac.uk/cord/.

show abstract

A Vietnamese Dataset for Evaluating Machine Reading Comprehension

Cited by 136 publications

References 22 publications

Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

A term-based and citation network-based search system for COVID-19

Contact Info

Product

Resources

About