R²GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network

Zhu, Bin; Ngo, Chong‐Wah; Chen, Jingjing; Hao, Yanbin

doi:10.1109/cvpr.2019.01174

Cited by 114 publications

(115 citation statements)

References 27 publications

Supporting

Mentioning

114

Contrasting

Order By: Relevance

“…Finally, as our model is specifically design for food recognition, a rich set of domain discriminators and regularizers are considered as loss functions. These include multi-label classification of ingredients as semantic regularizer as in [33], image and text domain discriminators as in [12], and reconstruction of images from recipes for shared representation learning [33,36].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Cross-domain Cross-modal Food Transfer

Zhu

Ngo

Chen

2020

Proceedings of the 28th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs for model transfer can be time-consuming. This paper addresses the challenge of resource scarcity in the scenario that only partial data instead of a complete view of data is accessible for model transfer. Partial data refers to missing information such as absence of image modality or cooking instructions from an image-recipe pair. To cope with partial data, a novel generic model, equipped with various loss functions including cross-modal metric learning, recipe residual loss, semantic regularization and adversarial learning, is proposed for cross-domain transfer learning. Experiments are conducted on three different cuisines (Chuan, Yue and Washoku) to provide insights on scaling up food recognition across domains with limited training resources. CCS CONCEPTS • Information systems → Multimedia and multimodal retrieval.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Based upon these prior works [4,7,9,29,33,36], this paper extends from cross-modal to cross-domain food retrieval. Leveraging on image-recipe pairs in a source domain, we consider the problem of food transfer as recognizing food in a target domain with new food categories and attributes.…”

Section: Introductionmentioning

confidence: 99%

Cross-domain Cross-modal Food Transfer

Zhu

Ngo

Chen

2020

Proceedings of the 28th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

show abstract

“…[Micael et al 2018] extended [Salvador et al 2017] by providing a double-triplet strategy to jointly express both the retrieval loss and the classification one for cross-modal retrieval. [Wang et al 2019;Zhu et al 2019] further introduced adversarial networks to impose the modality alignment for cross-modal retrieval. [Salvador et al 2019] proposed a new architecture for ingredient prediction that exploits co-dependencies among ingredients without imposing order for generating cooking instructions from an image and its ingredients.…”

Section: Referencementioning

confidence: 99%

A Survey on Food Computing

et al. 2019

View full text Add to dashboard Cite

Food is very essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture.With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries, leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in medicine, biology, gastronomy and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, vast amounts of work has been conducted in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews, which shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this paper, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields.

show abstract

“…Recipe1M [27] is the only large-scale food dataset with English recipes and images publicly available. Many related works [6,26,27,32] are based on this dataset. The raw dataset contains more than 1 million recipes and almost 900k images.…”

Section: Experiments 41 Datasetsmentioning

confidence: 99%

“…People tend to spend much time on recipes because cooking is closely related to our life. Lots of words have been done to deconstruct and understand food, including food classification [8,16], recipe-image embedding [6,27,32] and image-to-recipe generation [26]. Furthermore, dish appearance visualization in advance will be of great help for designing new recipes, which provides evident significance to image generation from given recipes.…”

Section: Introductionmentioning

confidence: 99%

ChefGAN

Pan

Dai

Hou

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Although significant progress has been made in generating images from the text by using generative adversarial networks (GANs), it is still challenging to deal with long text, which contains complex semantic information like recipes. This paper focuses on generating images with high visual realism and semantic consistency from the complex text of recipes. To achieve this, we propose a GANs based method termed ChefGAN. The critical concept of ChefGAN is that a joint image-recipe embedding model is used before the generation task to provide high-quality representations of recipes, and it acts as an extra regularization during the generation to improve semantic consistency. Two modules are designed for this image text embedding module (ITEM) and a cascaded image generation module (CIGM). The generation process is carried out in 3 steps: (1) Two encoders in ITEM are trained simultaneously to generate similar representations for each image-recipe pair. (2) CIGM generates images according to the representations from ITEM's text encoder. (3) The generated image is fed into ITEM's image encoder to calculate the similarity with the given recipe. This process can provide additional regularization effect other than the impact of a discriminator. To facilitate convergence, we applied a two-stage training strategy, which generates an image with low resolution and then one with high resolution in the CIGM module. Compared with other representative state-of-the-art methods, ChefGAN demonstrates better performance both in visual realism and semantic consistency. CCS CONCEPTS • Information systems → Multimedia content creation; • Computing methodologies → Computer vision representations.

show abstract

R²GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network

Cited by 114 publications

References 27 publications

Cross-domain Cross-modal Food Transfer

Cross-domain Cross-modal Food Transfer

A Survey on Food Computing

ChefGAN

Contact Info

Product

Resources

About