2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01246
|View full text |Cite
|
Sign up to set email alerts
|

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
62
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 81 publications
(64 citation statements)
references
References 39 publications
2
62
0
Order By: Relevance
“…the state-of-art commerce-domain pre-trained models [11,51]. We found even our smallest model already outperforms [11,51] with a clear margin, indicating CommerceMM's superior transferability.…”
Section: Transferability To Academic Datasetmentioning
confidence: 72%
See 3 more Smart Citations
“…the state-of-art commerce-domain pre-trained models [11,51]. We found even our smallest model already outperforms [11,51] with a clear margin, indicating CommerceMM's superior transferability.…”
Section: Transferability To Academic Datasetmentioning
confidence: 72%
“…One is using the ITM head to predict the matching score between the input image-text pair and rank the scores of all pairs [5,11,51].…”
Section: Downstream Tasksmentioning
confidence: 99%
See 2 more Smart Citations
“…The background and other garment items in a given image are thus distractions and should be removed. To this end, a series of pre-processing steps are introduced: (1) We use a salient object detection model [41,57] to remove the background, which is an easy task given the typical clean background in fashion catalog images. (2) When there are multiple garments with the same category in one image (e.g., shoes and gloves), if they do not overlap, we only keep the one with the largest pixel area; (3) We delete the masks of garment parts (e.g., sleeves and pockets) but merge their attributes into the garments they belong to; (4) We delete the garments that have low-resolution or extreme aspect ratio; (5) If there are pixels of other garments in the bounding box, we mask these excess pixels with gray color.…”
Section: A Additional Information On Uigr Datasetmentioning
confidence: 99%