2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.327
|View full text |Cite
|
Sign up to set email alerts
|

Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Abstract: In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
553
1
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 315 publications
(559 citation statements)
references
References 15 publications
(22 reference statements)
3
553
1
2
Order By: Relevance
“…The proposed method is trained and evaluated on Recipe1M [22], the largest publicly available multi-modal food database. Recipe1M Figure 2: Text-image embedding model with optional semantic classifier for semantic regularization according to [17] and with Ingredient Attention based instruction encoding provides over 1 million recipes (ingredients and instructions), accompanied by one or more images per recipe, leading to 13 million images.…”
Section: Materials and Methods 21 Databasementioning
confidence: 99%
“…The proposed method is trained and evaluated on Recipe1M [22], the largest publicly available multi-modal food database. Recipe1M Figure 2: Text-image embedding model with optional semantic classifier for semantic regularization according to [17] and with Ingredient Attention based instruction encoding provides over 1 million recipes (ingredients and instructions), accompanied by one or more images per recipe, leading to 13 million images.…”
Section: Materials and Methods 21 Databasementioning
confidence: 99%
“…We use the data in Recipe1M [17] dataset. As the fraction slash is not an ASCII character on recipe websites, it is missing in the dataset after preprocessing.…”
Section: Experiments 41 Datasetmentioning
confidence: 99%
“…However, there is another approach to this problem: perform food ingredient recognition and try to directly recognize the food ingredients from the image. This has been presented in a few recent solutions by Chen et al (25,26) and Salvador et al (27) , which detail the process of recognizing ingredients from food images and then linking them with recipes containing those ingredients.…”
Section: Food Image Recognitionmentioning
confidence: 99%