2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01157
|View full text |Cite
|
Sign up to set email alerts
|

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

1
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 39 publications
(30 citation statements)
references
References 33 publications
1
13
0
Order By: Relevance
“…Specifically, a novel Entity-Graph Enhanced Cross-Modal Pretraining (EGE-CMP) model is proposed for instance-level commodity retrieval, that explicitly injects entity knowledge in both node-based and subgraph-based ways into the multi-modal networks via a self-supervised hybrid-stream transformer, which could reduce the confusion between different object contents, thereby effectively guiding the network to focus on entities with real semantic. Experimental results well verify the efficacy and generalizability of our EGE-CMP, outperforming several SOTA cross-modal baselines like CLIP [1], UNITER [2] and CAPTURE [3].…”
supporting
confidence: 58%
See 4 more Smart Citations
“…Specifically, a novel Entity-Graph Enhanced Cross-Modal Pretraining (EGE-CMP) model is proposed for instance-level commodity retrieval, that explicitly injects entity knowledge in both node-based and subgraph-based ways into the multi-modal networks via a self-supervised hybrid-stream transformer, which could reduce the confusion between different object contents, thereby effectively guiding the network to focus on entities with real semantic. Experimental results well verify the efficacy and generalizability of our EGE-CMP, outperforming several SOTA cross-modal baselines like CLIP [1], UNITER [2] and CAPTURE [3].…”
supporting
confidence: 58%
“…The results of experiments on both multi-product retrieval and identical-product retrieval tasks show the superiority of our EGE-CMP over the SOTA cross-modal baselines, such as ViLBERT [7], CLIP [1], UNITER [2], CAPTURE [3] and so on, on all major criteria by a large margin. Moreover, extensive ablation experiments are conducted to demonstrate the generalizability of EGE-CMP and investigate various essential factors of our proposed task.…”
Section: Introductionmentioning
confidence: 95%
See 3 more Smart Citations