2021
DOI: 10.48550/arxiv.2111.02114
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
120
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(136 citation statements)
references
References 3 publications
1
120
0
Order By: Relevance
“…We pre-train the model for 20 epochs using a batch size of (Changpinyo et al, 2021), SBU captions (Ordonez et al, 2011)). We also experimented with an additional web dataset, LAION (Schuhmann et al, 2021), which contains 115M images with more noisy texts 1 . More details about the datasets can be found in the appendix.…”
Section: Pre-training Detailsmentioning
confidence: 99%
“…We pre-train the model for 20 epochs using a batch size of (Changpinyo et al, 2021), SBU captions (Ordonez et al, 2011)). We also experimented with an additional web dataset, LAION (Schuhmann et al, 2021), which contains 115M images with more noisy texts 1 . More details about the datasets can be found in the appendix.…”
Section: Pre-training Detailsmentioning
confidence: 99%
“…We observe that the model performs best with a cosine similarity threshold of 0.3 as we achieve a 0.84, 0.47, 0.80, 0.58 for accuracy, precision, recall, and f1 score, respectively. Indeed, the 0.3 threshold is also used by previous work by Schuhmann et al [55] that inspected CLIP's cosine similarities between text and images and determined that 0.3 is a suitable threshold.…”
Section: Identifying Antisemitic and Islamophobic Imagesmentioning
confidence: 99%
“…Besides CC12M, there also exist some other large- scale image-text datasets, such as WIT [41], WenLan [19], LAION-400M [39], and the datasets used in CLIP [38] and ALIGN [20]. More detailed discussions on them are provided in Appendix.…”
Section: Pre-training Datasetmentioning
confidence: 99%
“…• LAION-400M [39] has 400 million image-text pairs, and is recently released to public. Instead of applying human designed heuristics in data cleaning, this dataset relies on the CLIP [38] model to filter image-text pairs, where the cosine similarity scores between image and text embeddings are calculated and filtered by 0.3.…”
Section: B Comparison Of Image-text Datasetsmentioning
confidence: 99%
See 1 more Smart Citation