Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020 2020
DOI: 10.18653/v1/2020.nlpcovid19-2.31
|View full text |Cite
|
Sign up to set email alerts
|

Collecting Verified COVID-19 Question Answer Pairs

Abstract: We release a dataset of over 2,100 COVID-19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites. We include an additional 24, 000 questions pulled from online sources that have been aligned by experts with existing answered questions from our dataset. This paper describes our efforts in collecting the dataset and summarizes the resulting data. Our dataset is automatically updated daily and available at https://github.com/JHU-COVID-QA/ scraping-qas. So far, this data has been us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 2 publications
0
14
0
Order By: Relevance
“…Automating knowledge extraction of the latest information from reliable sources to automatically create question and answer pairs and update the chatbot’s knowledge base is a viable solution to this challenge. 38 , 39 …”
Section: Discussionmentioning
confidence: 99%
“…Automating knowledge extraction of the latest information from reliable sources to automatically create question and answer pairs and update the chatbot’s knowledge base is a viable solution to this challenge. 38 , 39 …”
Section: Discussionmentioning
confidence: 99%
“…The specialized, closed-domain knowledge base of the system in its instantiation for this experiment consists of questions and answers related to COVID-19 that were extracted from two sources. The first batch was pulled from a project that scraped COVID-19 data from 40 trusted websites, created question and answer pairs based on this data, and employed health care experts to determine the relevance of said pairs to unanswered questions [29]. For our purposes, pairs with a rating below 90 were filtered out and duplicates were removed.…”
Section: Datasetsmentioning
confidence: 99%
“…The most related works to ours are Sun and Sedoc (2020) and Poliak et al (2020), both of which constructed a collection of COVID-19 FAQs by scraping authoritative websites. However, the dataset in the former work is not available yet and the latter work does not evaluate models on their dataset, and there is still a great need to understand how existing models would perform on the COVID-19 FAQ retrieval task.…”
Section: Related Workmentioning
confidence: 99%
“…We developed scrapers 4 adapted from Poliak et al (2020), and add special features to COUGH dataset. Web scraping: We collect FAQ items from authoritative international organizations, state governments and other credible websites including reliable encyclopedias and medical forums.…”
Section: Faq Bank Constructionmentioning
confidence: 99%