Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Salvador, Amaia; Hynes, Nicholas; Aytar, Yusuf; Marín, Javier; Ofli, Ferda; Weber, Ingmar; Torralba, Antonio

doi:10.1109/cvpr.2017.327

Cited by 315 publications

(559 citation statements)

References 15 publications

(22 reference statements)

Supporting

Mentioning

553

Contrasting

Unclassified

Order By: Relevance

“…The proposed method is trained and evaluated on Recipe1M [22], the largest publicly available multi-modal food database. Recipe1M Figure 2: Text-image embedding model with optional semantic classifier for semantic regularization according to [17] and with Ingredient Attention based instruction encoding provides over 1 million recipes (ingredients and instructions), accompanied by one or more images per recipe, leading to 13 million images.…”

Section: Materials and Methods 21 Databasementioning

confidence: 99%

Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

Fontanellaz

Christodoulidis

Mougiakakou

2019

Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management

View full text Add to dashboard Cite

Direct computer vision based-nutrient content estimation is a demanding task, due to deformation and occlusions of ingredients, as well as high intra-class and low inter-class variability between meal classes. In order to tackle these issues, we propose a system for recipe retrieval from images. The recipe information can subsequently be used to estimate the nutrient content of the meal. In this study, we utilize the multi-modal Recipe1M dataset, which contains over 1 million recipes accompanied by over 13 million images. The proposed model can operate as a first step in an automatic pipeline for the estimation of nutrition content by supporting hints related to ingredient and instruction. Through self-attention, our model can directly process raw recipe text, making the upstream instruction sentence embedding process redundant and thus reducing training time, while providing desirable retrieval results. Furthermore, we propose the use of an ingredient attention mechanism, in order to gain insight into which instructions, parts of instructions or single instruction words are of importance for processing a single ingredient within a certain recipe. Attention-based recipe text encoding contributes to solving the issue of high intra-class/low inter-class variability by focusing on preparation steps specific to the meal. The experimental results demonstrate the potential of such a system for recipe retrieval from images. A comparison with respect to two baseline methods is also presented.

show abstract

Section: Materials and Methods 21 Databasementioning

confidence: 99%

Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

Fontanellaz

Christodoulidis

Mougiakakou

2019

Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management

View full text Add to dashboard Cite

show abstract

“…We use the data in Recipe1M [17] dataset. As the fraction slash is not an ASCII character on recipe websites, it is missing in the dataset after preprocessing.…”

Section: Experiments 41 Datasetmentioning

confidence: 99%

Deep Cooking: Predicting Relative Food Ingredient Amounts from Images

Pavlović

2019

Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management

View full text Add to dashboard Cite

In this paper, we study the novel problem of not only predicting ingredients from a food image, but also predicting the relative amounts of the detected ingredients. We propose two predictionbased models using deep learning that output sparse and dense predictions, coupled with important semi-automatic multi-database integrative data pre-processing, to solve the problem. Experiments on a dataset of recipes collected from the Internet show the models generate encouraging experimental results. CCS CONCEPTS• Computing methodologies → Neural networks.

show abstract

“…However, there is another approach to this problem: perform food ingredient recognition and try to directly recognize the food ingredients from the image. This has been presented in a few recent solutions by Chen et al (25,26) and Salvador et al (27) , which detail the process of recognizing ingredients from food images and then linking them with recipes containing those ingredients.…”

Section: Food Image Recognitionmentioning

confidence: 99%

Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment

Mezgec

Eftimov

Bucher

et al. 2018

Public Health Nutr.

View full text Add to dashboard Cite

Objective: The present study tested the combination of an established and a validated food-choice research method (the 'fake food buffet') with a new foodmatching technology to automate the data collection and analysis. Design: The methodology combines fake-food image recognition using deep learning and food matching and standardization based on natural language processing. The former is specific because it uses a single deep learning network to perform both the segmentation and the classification at the pixel level of the image. To assess its performance, measures based on the standard pixel accuracy and Intersection over Union were applied. Food matching firstly describes each of the recognized food items in the image and then matches the food items with their compositional data, considering both their food names and their descriptors. Results: The final accuracy of the deep learning model trained on fake-food images acquired by 124 study participants and providing fifty-five food classes was 92·18 %, while the food matching was performed with a classification accuracy of 93 %. Conclusions: The present findings are a step towards automating dietary assessment and food-choice research. The methodology outperforms other approaches in pixel accuracy, and since it is the first automatic solution for recognizing the images of fake foods, the results could be used as a baseline for possible future studies. As the approach enables a semi-automatic description of recognized food items (e.g. with respect to FoodEx2), these can be linked to any food composition database that applies the same classification and description system. Keywords Fake food buffetFood replica Food image recognition Food matching Food standardization Measuring dietary behaviour using traditional, non-automated, self-reporting technologies is associated with considerable costs, which means researchers have been particularly interested in developing new, automated approaches. There is a clear need in dietary assessment and health-care systems for easy-to-use devices and software solutions that can identify foods, quantify intake, record health behaviour and compliance, and measure eating contexts. The aim of the present study was to test the combination of an established and validated food-choice research method, the 'fake food buffet' (FFB), with a new food-matching technology to automate the data collection and analysis.The FFB was developed as an experimental method to study complex food choice, meal composition and portionsize choice under controlled laboratory conditions. The FFB is a selection of very authentic replica-food items, from which consumers are invited to choose. The FFB method was validated by a comparison of meals served from real and fake foods (1) . The food portions served from the fake foods correlated closely with the portions served from the real foods (1) . Furthermore, significant correlations between the participants' energy needs and the amounts served were found in several studies (1)(2)(3)(4) . It has also bee...

show abstract

Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Cited by 315 publications

References 15 publications

Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

Deep Cooking: Predicting Relative Food Ingredient Amounts from Images

Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment

Contact Info

Product

Resources

About