2022
DOI: 10.1613/jair.1.13113
|View full text |Cite
|
Sign up to set email alerts
|

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Abstract: Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people nav… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 41 publications
(51 reference statements)
0
7
0
Order By: Relevance
“…Here, all unseen models are considered as one class. 9 In our evaluation, we treat DALL•E 2 as one unseen model (as mentioned before). We first divide the datasets into training, validation, and testing parts.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, all unseen models are considered as one class. 9 In our evaluation, we treat DALL•E 2 as one unseen model (as mentioned before). We first divide the datasets into training, validation, and testing parts.…”
Section: Discussionmentioning
confidence: 99%
“…Here, we explore whether the quality of the BLIP-generated prompts affects the detection performance. To measure the quality of the generated prompts by BLIP, we leverage a new term called prompt descriptiveness [9,10,23,35]. Prompt descriptiveness can be quantitatively measured by computing the cosine similarity between a prompt's embedding and its image's embedding generated by CLIP.…”
Section: Ablation Studymentioning
confidence: 99%
“…This technological advancement serves to connect visual and textual data, so enabling deeper understanding of image content and creating opportunities for diverse applications [1]. The field of image captioning is gaining considerable interest owing to its capacity to boost image accessibility, assist individuals with visual impairments, automate content creation, and enhance image retrieval systems [2]. Especially in video summarization, image caption generation is a potent tool with uses that go beyond individual images.…”
Section: Introductionmentioning
confidence: 99%
“…With the continuous development of computer vision technology, sports video analysis technology has been widely used in the event analysis of sports competitions. It can provide athletes and coaches with corresponding data as a reference through video analysis and make a relatively systematic evaluation of individual athletes' and groups' performance in sports competitions [ 1 ]. In recent years, the number of sports videos has increased geometrically, and at the same time, there is a large amount of interference information in the huge amount of sports videos [ 2 ].…”
Section: Introductionmentioning
confidence: 99%