A Dataset and Baselines for Visual Question Answering on Art

García, Noa; Ye, Chentao; Liu, Zihua; Hu, Qingtao; Otani, Mayu; Chu, Chenhui; Nakashima, Yuta; Mitamura, Teruko

doi:10.1007/978-3-030-66096-3_8

Cited by 33 publications

(21 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NoisyArt [5] is a dataset composed of artwork images collected from Google Images and Flickr containing also metadata (e.g., artwork title, comments, description and creation location) gathered from DBpedia. A recent work presented the AQUA [10] (Art QUestion Answering) dataset which contains question-answer pairs automatically generated using state-ofthe-art question generation methods on the basis of paintings and comments provided by the SemArt [10] dataset. EGO-CH [28] is a dataset of egocentric videos for visitors' behaviour understanding in cultural sites.…”

Section: Datasets In Cultural Sitesmentioning

confidence: 99%

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Mazzamuto¹,

Ragusa²,

Furnari³

et al. 2022

Preprint

View full text Add to dashboard Cite

We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/

show abstract

Section: Datasets In Cultural Sitesmentioning

confidence: 99%

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Mazzamuto¹,

Ragusa²,

Furnari³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In [ 36 ], the authors annotated a subset of the ArtPedia dataset with visual and contextual question–answer pairs and introduced a question classifier that discriminates between visual and contextual questions and a model that is able to answer both types of questions. In [ 37 ], the authors introduce a novel dataset AQUA (Art QUestion Answering), which consists of automatically generated visual and knowledge-based question-answer pairs, and also present a two-branch model where the visual and knowledge questions are handled independently.…”

Section: Related Workmentioning

confidence: 99%

Towards Generating and Evaluating Iconographic Image Captions of Artworks

Cetinić

2021

J. Imaging

View full text Add to dashboard Cite

To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model’s capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets.

show abstract

“…They introduced a question classifier that discriminates between visual and contextual questions and a model capable of answering both types of questions. Garcia et al [49] presented a novel dataset AQUA, which consists of automatically generated visual-and knowledge-based QA pairs, and introduced a two-branch model where the visual and knowledge questions are managed independently. Apart from VAQ, a few recent works addressed the task of image captioning where the goal is to automatically generate accurate textual descriptions of images.…”

Section: Multimodal Tasksmentioning

confidence: 99%

Understanding and Creating Art with AI: Review and Outlook

Cetinić¹,

She²

2021

Preprint

View full text Add to dashboard Cite

Technologies related to artificial intelligence (AI) have a strong impact on the changes of research and creative practices in visual arts. The growing number of research initiatives and creative applications that emerge in the intersection of AI and art, motivates us to examine and discuss the creative and explorative potentials of AI technologies in the context of art. This paper provides an integrated review of two facets of AI and art: 1) AI is used for art analysis and employed on digitized artwork collections; 2) AI is used for creative purposes and generating novel artworks. In the context of AI-related research for art understanding, we present a comprehensive overview of artwork datasets and recent works that address a variety of tasks such as classification, object detection, similarity retrieval, multimodal representations, computational aesthetics, etc. In relation to the role of AI in creating art, we address various practical and theoretical aspects of AI Art and consolidate related works that deal with those topics in detail. Finally, we provide a concise outlook on the future progression and potential impact of AI technologies on our understanding and creation of art.

show abstract

A Dataset and Baselines for Visual Question Answering on Art

Cited by 33 publications

References 46 publications

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Towards Generating and Evaluating Iconographic Image Captions of Artworks

Understanding and Creating Art with AI: Review and Outlook

Contact Info

Product

Resources

About