Abstract:We present Persian Question Answering Dataset (PQuAD), a crowdsourced reading comprehension dataset on Persian Wikipedia articles. It includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable. We examine various properties of the dataset to show the diversity and the level of its difficulty as a MRC benchmark. By releasing this dataset, we aim to ease research on Persian reading comprehension and development of persian question answering systems. Our experim… Show more
“…There are Persian datasets for NLP tasks like questionanswering [12], [13], [14], language modeling [19], or sentiment analysis [20]. However, there is no Persian benchmark dataset for the NLU task.…”
“…There are Persian datasets for NLP tasks like questionanswering [12], [13], [14], language modeling [19], or sentiment analysis [20]. However, there is no Persian benchmark dataset for the NLU task.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.