2021
DOI: 10.48550/arxiv.2111.08687
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

INTERN: A New Learning Paradigm Towards General Vision

Abstract: Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a key challenge awaits us, that is, our capability of meeting rapidly-growing scenario-specific demands is severely limited by the cost of acquiring a commensurate amount of training data. This difficult situation is in essence due to limitations of the mainstream learning paradigm: we need to train a new model for eac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 37 publications
0
13
0
Order By: Relevance
“…Recent studies have demonstrated that TOV model trained by contrastive self-supervised learning with mass unlabeled nature images has impressive generalizability, which perform comparably well or even better than supervised learning methods across various computer vision tasks [11,16,21]. However, we experimentally find that directly using this pipeline to train TOV model for RSIU cannot obtain desired results.…”
Section: Training Tov Model For Rsiu Based On a Human-like Ssl Mechanismmentioning
confidence: 73%
See 1 more Smart Citation
“…Recent studies have demonstrated that TOV model trained by contrastive self-supervised learning with mass unlabeled nature images has impressive generalizability, which perform comparably well or even better than supervised learning methods across various computer vision tasks [11,16,21]. However, we experimentally find that directly using this pipeline to train TOV model for RSIU cannot obtain desired results.…”
Section: Training Tov Model For Rsiu Based On a Human-like Ssl Mechanismmentioning
confidence: 73%
“…Unlike the machine vision that is "taught" by labeled data, human-like vision is achieved by holistic and joint models that can simultaneously solve realworld problems by unsupervised way [11]. The key reason is that human visual recognition system is not limited to a specific task or specific dataset, and human language based labels are not the prerequisite for constructing the human visual system.…”
Section: Introductionmentioning
confidence: 99%
“…Recent progress has shown a great interest in general-purpose models [31,21,34,20,52,1] which can deal with a wide variety of input modalities and output tasks. Previous works [31,21] train models with a huge amount of image-text pairs by matching images to their captions.…”
Section: General-purpose Modelsmentioning
confidence: 99%
“…Interestingly CLIP can be even used in text-guided image generation task ( Style-CLIP [16]) and Embodied AI ( EmbCLIP [8]). CLIP has also contributed to the development of general vision [20]. Witnessing CLIP's active community and wide applications, we propose the first work to benchmark CLIP.…”
Section: Related Workmentioning
confidence: 99%