2022
DOI: 10.25073/2588-1086/vnucsce.341
|View full text |Cite
|
Sign up to set email alerts
|

VLSP 2021 - VieCap4H Challenge: Automatic Image Caption Generation for Healthcare Domain in Vietnamese

Abstract: This paper presents VieCap4H, a grand data challenge on automatic image caption generation for the healthcare domain in Vietnamese. VieCap4H is held as part of the eighth annual workshop on VietnameseLanguage and Speech Processing (VLSP 2021). The task is considered as an image captioning task. Given a static image, mostly about healthcare-related scenarios, participants are asked to design machine learning methods to generate natural language captions in Vietnamese to describe the visual content of the image.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 26 publications
0
8
0
Order By: Relevance
“…In Transformer architecture, we use 𝑁 𝑒𝑛𝑐𝑜𝑑𝑒𝑟 = 3, 𝑁 𝑑𝑒𝑐𝑜𝑑𝑒𝑟 = 3, 𝑑 𝑚𝑜𝑑𝑒𝑙 = 512 and the number of attention heads is 8. During inference, we use three values 𝑚𝑎𝑥𝑙𝑒𝑛𝑔𝑡ℎ ∈ [20,22,23], apply beam search with 𝑏𝑒𝑎𝑚𝑠𝑖𝑧𝑒 ∈ [3,4,5].and use 50 submissions in public-test round to evaluate. For the private test, we use two values 𝑚𝑎𝑥𝑙𝑒𝑛𝑔𝑡ℎ ∈ [22,23,24].…”
Section: Other Hyperparametersmentioning
confidence: 99%
See 2 more Smart Citations
“…In Transformer architecture, we use 𝑁 𝑒𝑛𝑐𝑜𝑑𝑒𝑟 = 3, 𝑁 𝑑𝑒𝑐𝑜𝑑𝑒𝑟 = 3, 𝑑 𝑚𝑜𝑑𝑒𝑙 = 512 and the number of attention heads is 8. During inference, we use three values 𝑚𝑎𝑥𝑙𝑒𝑛𝑔𝑡ℎ ∈ [20,22,23], apply beam search with 𝑏𝑒𝑎𝑚𝑠𝑖𝑧𝑒 ∈ [3,4,5].and use 50 submissions in public-test round to evaluate. For the private test, we use two values 𝑚𝑎𝑥𝑙𝑒𝑛𝑔𝑡ℎ ∈ [22,23,24].…”
Section: Other Hyperparametersmentioning
confidence: 99%
“…For evaluation, we use the Bilingual Evaluation Understudy (BLEU) metric, which was first used for evaluating the performance of the captioning model in [1]. This metric is also used in the VieCap4H challenge [3]. BLEU score is commonly used in machine translation tasks.…”
Section: Metricmentioning
confidence: 99%
See 1 more Smart Citation
“…To encourage conducting research on Vietnamese image captioning, [2] created a dataset for Vietnamese domain, also serving as a premise for researching on Vietnamese image captioning for healthcare domain. The vieCap4H Challenge 2021 [3] aims to be a competition for developing machine learning algorithms that use Vietnamese to describe the visual content in healthcare settings, especially images that describe the COVID-19 pandemic. Similar to this task, the most recent studies were presented in [4] and [5] that proposed a network involving a deep Convolutional Neural Network (CNN) as an encoder and a Recurrent Neural Network (RNN) as a decoder.…”
Section: Introductionmentioning
confidence: 99%
“…To evaluate the performance of different types of region features, Transformer-based [35] model is used to train to generating captions. Two benchmark datasets for image captioning in Vietnamese are used to evaluate the effectiveness: UIT-ViIC [13] and VieCap4H [14].…”
Section: Introductionmentioning
confidence: 99%