2022
DOI: 10.48550/arxiv.2204.01678
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(15 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…It would also be interesting to study introducing auxiliary prediction for other modalities, such as audio. Another weakness is that our model operates only on RGB pixels from a single camera viewpoint; we look forward to a future work that incorporates different input modalities such as proprioceptive states and point clouds, building on top of the recent multi-modal learning approaches [52,53]. Finally, our approach trains behaviors from scratch, which makes it still too sample-inefficient to be used in real-world scenarios.…”
Section: Discussionmentioning
confidence: 99%
“…It would also be interesting to study introducing auxiliary prediction for other modalities, such as audio. Another weakness is that our model operates only on RGB pixels from a single camera viewpoint; we look forward to a future work that incorporates different input modalities such as proprioceptive states and point clouds, building on top of the recent multi-modal learning approaches [52,53]. Finally, our approach trains behaviors from scratch, which makes it still too sample-inefficient to be used in real-world scenarios.…”
Section: Discussionmentioning
confidence: 99%
“…In computer vision, a popular method for MTL is to employ a single encoder to learn a shared representation, followed by numerous task-specific decoders [14,15]. In this paper, a similar strategy is employed by training one main backbone model together with several small task-specific heads.…”
Section: Multi-task Learning and Self-trainingmentioning
confidence: 99%
“…Pseudo labeling is a one-time preprocessing method applicable to RGB datasets of variable size. Compared to the training cost, this phase is computationally inexpensive [14].…”
Section: Pseudo-labeled Multi-task Trainingmentioning
confidence: 99%
“…GMAE [36] adapts MAE to the domain of graphs. MultiMAE [37] enhance the flexibility of MAE by enabling it to take optional input of different modality and correspondingly adding other training objectives to facilitate multi-modality learning. However, these works fail to handle temporal and multi-spectral input.…”
Section: Introductionmentioning
confidence: 99%