2022
DOI: 10.48550/arxiv.2202.06219
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PQuAD: A Persian Question Answering Dataset

Abstract: We present Persian Question Answering Dataset (PQuAD), a crowdsourced reading comprehension dataset on Persian Wikipedia articles. It includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable. We examine various properties of the dataset to show the diversity and the level of its difficulty as a MRC benchmark. By releasing this dataset, we aim to ease research on Persian reading comprehension and development of persian question answering systems. Our experim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 10 publications
0
1
0
Order By: Relevance
“…There are Persian datasets for NLP tasks like questionanswering [12], [13], [14], language modeling [19], or sentiment analysis [20]. However, there is no Persian benchmark dataset for the NLU task.…”
Section: Description Of Persian Datasetmentioning
confidence: 99%
“…There are Persian datasets for NLP tasks like questionanswering [12], [13], [14], language modeling [19], or sentiment analysis [20]. However, there is no Persian benchmark dataset for the NLU task.…”
Section: Description Of Persian Datasetmentioning
confidence: 99%