A daily dietary assessment method named 24hour dietary recall has commonly been used in nutritional epidemiology studies to capture detailed information of the food eaten by the participants to help understand their dietary behaviour. However, in this self-reporting technique, the food types and the portion size reported highly depends on users' subjective judgement which may lead to a biased and inaccurate dietary analysis result. As a result, a variety of visual-based dietary assessment approaches have been proposed recently. While these methods show promises in tackling issues in nutritional epidemiology studies, several challenges and forthcoming opportunities, as detailed in this study, still exist. This study provides an overview of computing algorithms, mathematical models and methodologies used in the field of image-based dietary assessment. It also provides a comprehensive comparison of the state of the art approaches in food recognition and volume/weight estimation in terms of their processing speed, model accuracy, efficiency and constraints. It will be followed by a discussion on deep learning method and its efficacy in dietary assessment. After a comprehensive exploration, we found that integrated dietary assessment systems combining with different approaches could be the potential solution to tackling the challenges in accurate dietary intake assessment.
An objective dietary assessment system can help users to understand their dietary behavior and enable targeted interventions to address underlying health problems. To accurately quantify dietary intake, measurement of the portion size or food volume is required. For volume estimation, previous research studies mostly focused on using model-based or stereo-based approaches which rely on manual intervention or require users to capture multiple frames from different viewing angles which can be tedious. In this paper, a view synthesis approach based on deep learning is proposed to reconstruct 3D point clouds of food items and estimate the volume from a single depth image. A distinct neural network is designed to use a depth image from one viewing angle to predict another depth image captured from the corresponding opposite viewing angle. The whole 3D point cloud map is then reconstructed by fusing the initial data points with the synthesized points of the object items through the proposed point cloud completion and Iterative Closest Point (ICP) algorithms. Furthermore, a database with depth images of food object items captured from different viewing angles is constructed with image rendering and used to validate the proposed neural network. The methodology is then evaluated by comparing the volume estimated by the synthesized 3D point cloud with the ground truth volume of the object items.
Dietary assessment is an important tool for nutritional epidemiology studies. To assess the dietary intake, the common approach is to carry out 24-h dietary recall (24HR), a structured interview conducted by experienced dietitians. Due to the unconscious biases in such self-reporting methods, many research works have proposed the use of vision-based approaches to provide accurate and objective assessments. In this article, a novel vision-based method based on real-time three-dimensional (3-D) reconstruction and deep learning view synthesis is proposed to enable accurate portion size estimation of food items consumed. A point completion neural network is developed to complete partial point cloud of food items based on a single depth image or video captured from any convenient viewing position. Once 3-D models of food items are reconstructed, the food volume can be estimated through meshing. Compared to previous methods, our method has addressed several major challenges in vision-based dietary assessment, such as view occlusion and scale ambiguity, and it outperforms previous approaches in accurate portion size estimation.
A novel vision-based approach for estimating individual dietary intake in food sharing scenarios is proposed in this paper, which incorporates food detection, face recognition and hand tracking techniques. The method is validated using panoramic videos which capture subjects' eating episodes. The results demonstrate that the proposed approach is able to reliably estimate food intake of each individual as well as the food eating sequence. To identify the food items ingested by the subject, a transfer learning approach is designed. 4, 200 food images with segmentation masks, among which 1, 500 are newly annotated, are used to fine-tune the deep neural network for the targeted food intake application. In addition, a method for associating detected hands with subjects is developed and the outcomes of face recognition are refined to enable the quantification of individual dietary intake in communal eating settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.