Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application -classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.
Collaborative filtering is a method for making predictions about consumer interests by collecting preferences or information about opinions from other consumers. For this purpose statistical modeling techniques are applied to learn personalized models for each consumer based on every purchase or provided rating to the available items. Such a technique is probabilistic Latent Semantic Analysis (pLSA), which within this thesis attempts to model consumers into groups based on similarities in movie preferences to improve personalized rating predictions on unseen movies. The main challenge with pLSA in collaborative filtering is the overfitting problem, which results in model parameters that are strictly determined by the past ratings and thus gives unreliable predictions for unknown data. To counteract the overfitting a regularization method called conjugate-prior-regularization is proposed to introduce additional information about the proportions of the model parameters. It is shown that the proposed regularization provides more robust learning from sparse data sets and also improves the recommendation performance on discrete ratings.ii Acknowledgments First and foremost, I want to express my gratitude to my supervisors Stefan Ingi Adalbjörnsson and Søren Vang Andersen for all the encouragement and trust they have given to me during these two semesters. They have inspired me to always do my best and also to challenge myself when it comes to learning and understanding new knowledge. I look back on this time, even the difficult moments during the work, with great joy and I hope that I get the opportunity to work with them again in the future.Secondly, I want to thank Andreas Jakobsson and Johan Swärd for their welcoming and positive presence during my thesis work and MagnusÖrn Berg for inspiring me to write this thesis and allowing me to use and extend parts of his code. I would also like to thank Carl-Gustaf Werner for giving me a better computer and for helping me with rookie problems in Linux.Last but by no means least, I want to express my gratitude to the other Master's thesis students I have met at the Department of Mathematics -Lea, Linus, Gabrielle, David, Edvard, Carolina, Jonas, Matilda, Anna, Kajsa, Teo, Erik and Josephine. Thank you for motivating me to have a hard working attitude and for keeping up a happy and cheerful atmosphere during both working hours and coffee breaks. Studying side by side with you has helped me a lot in writing this thesis.iii
Summary An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruits, vegetables, and packaged products—where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.