Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023
DOI: 10.18653/v1/2023.emnlp-main.119
|View full text |Cite
|
Sign up to set email alerts
|

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

Andrea Burns,
Krishna Srinivasan,
Joshua Ainslie
et al.

Abstract: Webpages have been a rich, scalable resource for vision-language and language only tasks. Yet only pieces of webpages are kept in existing datasets: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data left underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage suite (WikiWeb2M) containing 2M pages with all of the associated image, text, and structure data 1 . We ve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 23 publications
(33 reference statements)
0
0
0
Order By: Relevance