Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images

Wang, Hao; Sahoo, Debasish; Liu, Chenghao; Lim, Ee-Peng; Hoi, Steven C. H.

doi:10.1109/cvpr.2019.01184

Cited by 117 publications

(212 citation statements)

References 29 publications

Supporting

Mentioning

211

Contrasting

Order By: Relevance

“…Based upon these prior works [4,7,9,29,33,36], this paper extends from cross-modal to cross-domain food retrieval. Leveraging on image-recipe pairs in a source domain, we consider the problem of food transfer as recognizing food in a target domain with new food categories and attributes.…”

Section: Introductionmentioning

confidence: 99%

Cross-domain Cross-modal Food Transfer

Zhu

Ngo

Chen

2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs for model transfer can be time-consuming. This paper addresses the challenge of resource scarcity in the scenario that only partial data instead of a complete view of data is accessible for model transfer. Partial data refers to missing information such as absence of image modality or cooking instructions from an image-recipe pair. To cope with partial data, a novel generic model, equipped with various loss functions including cross-modal metric learning, recipe residual loss, semantic regularization and adversarial learning, is proposed for cross-domain transfer learning. Experiments are conducted on three different cuisines (Chuan, Yue and Washoku) to provide insights on scaling up food recognition across domains with limited training resources. CCS CONCEPTS • Information systems → Multimedia and multimodal retrieval.

show abstract

Section: Introductionmentioning

confidence: 99%

Cross-domain Cross-modal Food Transfer

Zhu

Ngo

Chen

2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…Jin et al analyzed images between true news and fake news in terms of, e.g., their clarity [5]. Along with the recent advances in deep learning, various RNNs and CNNs have been developed for multi-modal fake news detection and related tasks [4,7,18,21,23,24]. To learn the multi-modal (textual and visual) representation of news content, Jin et al developed VGG-19 and LSTM with an attention mechanism [4], and Khattar et al designed an encoder-decoder mechanism [7].…”

Section: Content-based Fake News Detectionmentioning

confidence: 99%

$$\mathsf {SAFE}$$: Similarity-Aware Multi-modal Fake News Detection

Zhou

Zafarani

2020

Lecture Notes in Computer Science

212

112

View full text Add to dashboard Cite

Effective detection of fake news has recently attracted significant attention. Current studies have made significant contributions to predicting fake news with less focus on exploiting the relationship (similarity) between the textual and visual information in news articles. Attaching importance to such similarity helps identify fake news stories that, for example, attempt to use irrelevant images to attract readers' attention. In this work, we propose a Similarity-Aware FakE news detection method (SAFE) which investigates multi-modal (textual and visual) information of news articles. First, neural networks are adopted to separately extract textual and visual features for news representation. We further investigate the relationship between the extracted features across modalities. Such representations of news textual and visual information along with their relationship are jointly learned and used to predict fake news. The proposed method facilitates recognizing the falsity of news articles based on their text, images, or their "mismatches." We conduct extensive experiments on large-scale real-world data, which demonstrate the effectiveness of the proposed method.

show abstract

“…[Micael et al 2018] extended [Salvador et al 2017] by providing a double-triplet strategy to jointly express both the retrieval loss and the classification one for cross-modal retrieval. [Wang et al 2019;Zhu et al 2019] further introduced adversarial networks to impose the modality alignment for cross-modal retrieval. [Salvador et al 2019] proposed a new architecture for ingredient prediction that exploits co-dependencies among ingredients without imposing order for generating cooking instructions from an image and its ingredients.…”

Section: Referencementioning

confidence: 99%

A Survey on Food Computing

et al. 2019

View full text Add to dashboard Cite

Food is very essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture.With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries, leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in medicine, biology, gastronomy and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, vast amounts of work has been conducted in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews, which shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this paper, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields.

show abstract

Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images

Cited by 117 publications

References 29 publications

Cross-domain Cross-modal Food Transfer

Cross-domain Cross-modal Food Transfer

$$\mathsf {SAFE}$$: Similarity-Aware Multi-modal Fake News Detection

A Survey on Food Computing

Contact Info

Product

Resources

About