2022
DOI: 10.48550/arxiv.2201.12888
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Dataset for Medical Instructional Video Classification and Question Answering

Abstract: This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

2
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(15 citation statements)
references
References 23 publications
2
13
0
Order By: Relevance
“…Medical Video Question Answering (MedVidQA) datasets [6] is the first video question answering (VQA) dataset [45] constructed in natural language video localization (NLVL) [21,46] , which aims to provide medical instructional video with text question query. Three medical informatics experts were asked to formulate the medical and health-related instructional questions by watching the given video.…”
Section: Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…Medical Video Question Answering (MedVidQA) datasets [6] is the first video question answering (VQA) dataset [45] constructed in natural language video localization (NLVL) [21,46] , which aims to provide medical instructional video with text question query. Three medical informatics experts were asked to formulate the medical and health-related instructional questions by watching the given video.…”
Section: Datasetsmentioning
confidence: 99%
“…Following prior works [6,21,26,47,48], we adopt "R@n, IoU = 𝜇" and "mIoU" as the evaluation metrics, which treats localization of the frames in the video as a span prediction task similar to answer span prediction [49,50] in text-based question answering. The "R@n, IoU = 𝜇" denotes the percentage of language queries having at least one result whose Inter-section over Union (IoU) with ground truth is larger than 𝜇 in top-n retrieved moments.…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 3 more Smart Citations