2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01522
|View full text |Cite
|
Sign up to set email alerts
|

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
76
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 48 publications
(76 citation statements)
references
References 31 publications
0
76
0
Order By: Relevance
“…In addition, the author proposed a self-supervised loss function computed based on pairs of the individual recipe components to leverage the semantic relationships within recipes. Whereas in [8] the author developed a neural with joint embedding learned on the recipes and images in common space. In the model a high-level classification task was added to further improve the classification performance.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…In addition, the author proposed a self-supervised loss function computed based on pairs of the individual recipe components to leverage the semantic relationships within recipes. Whereas in [8] the author developed a neural with joint embedding learned on the recipes and images in common space. In the model a high-level classification task was added to further improve the classification performance.…”
Section: Related Workmentioning
confidence: 99%
“…We first start with the models that deliver the state-of-theart (SOTA) performance on accuracy. Specifically, we focus on two models that frame the problem as a cross-modal recipe retrieval task [6] [8]. The main difference between these two models lies in the design of recipe encoder: 1) one uses a two-stage LSTM [6] while the other uses hierarchical transformers [6].…”
Section: Approachmentioning
confidence: 99%
See 3 more Smart Citations