2022
DOI: 10.48550/arxiv.2201.04026
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…They constructed a decision fusion module to combine the outputs of Transformer modules at different granularities. To simultaneously pretrain the encoder for multi-modality representation extraction and the language decoder for sentence generation, literature [ 29 ] proposed a pretrained universal encoder-decoder network (Uni-EDEN) to facilitate visual language perception and generation. The model undergoes pretraining using multi-granularity visual language proxy tasks: Masked Object Classification (MOC), Masked Region Phrase Generation (MRPG), Image-Sentence Matching (ISM), and Masked Sentence Generation (MSG).…”
Section: Related Workmentioning
confidence: 99%
“…They constructed a decision fusion module to combine the outputs of Transformer modules at different granularities. To simultaneously pretrain the encoder for multi-modality representation extraction and the language decoder for sentence generation, literature [ 29 ] proposed a pretrained universal encoder-decoder network (Uni-EDEN) to facilitate visual language perception and generation. The model undergoes pretraining using multi-granularity visual language proxy tasks: Masked Object Classification (MOC), Masked Region Phrase Generation (MRPG), Image-Sentence Matching (ISM), and Masked Sentence Generation (MSG).…”
Section: Related Workmentioning
confidence: 99%