DeepStyle: Multimodal Search Engine for Fashion and Interior Design

Tautkute, Ivona; Trzciński, Tomasz Piotr; Skorupa, Aleksander; Lukasz, Krzysztof; Marasek, Krzysztof

doi:10.1109/access.2019.2923552

Cited by 53 publications

(16 citation statements)

References 40 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, our convolution network is novel in learning not only content features but also each user's preferences via the end-to-end framework with the triplet loss. Compared with recent methods for Web content retrieval [41,69] that use content features extracted from the pre-trained convolution network, this novelty is unique. Fig.…”

Section: Feature Extraction Methods For Real-world Applicationmentioning

confidence: 99%

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Ohtomo

Harakawa

Ogawa

et al. 2021

Multimed Tools Appl

View full text Add to dashboard Cite

Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formulate each user's preferences as a triplet loss by using Likes as metadata as well as the text-and image-related features of posts. Furthermore, we develop a personalized multivariational autoencoder (PMVAE) by introducing a triplet loss into multivariational autoencoder (MVAE), which is among the most effective methods of multimodal feature extraction. Previously proposed variants of MVAE can project multiple kinds of features into the single latent features. However, because the latent features do not reflect each user's preferences, retrieval performance when using the previous methods is limited. On the contrary, our PMVAE can extract relationships between text-and imagerelated features of posts by considering class-related information that represents whether a user prefers a given post. As a result, user-centric multimodal features, which separate a post that a user prefer and a post that a user does not prefer in the latent feature space, can be obtained. Because user-centric multimodal features have high discriminating power, the personalized retrieval of posts desired by each user becomes feasible by using them in such retrieval algorithms as the k-nearest neighbors and Annoy, which is a technique for approximate nearest neighbor search. We conduct experiments using 10 users and 150,947 contents, to verify the performance of k-NN and Annoy. The results show that our PMVAE increased normalized discounted cumulative gain (nDCG) compared with existing methods. The nDCG becomes 0.253 when using term frequency-inverse document frequency based text features and our end-to-end image features.

show abstract

Section: Feature Extraction Methods For Real-world Applicationmentioning

confidence: 99%

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Ohtomo

Harakawa

Ogawa

et al. 2021

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…Image classification models often employ CNNs to extract features related to shapes and textures in an image and to generate predictions of relevant attributes [17]. Similar ideas have been extended to clothing recommendation [18] and fashion image retrieval [19], [20]. To improve the performance of classification, several studies introduced multi-task learning (MTL) into their methods [21], [22].…”

Section: A Fashion Attribute Recognitionmentioning

confidence: 99%

From Street Photos to Fashion Trends: Leveraging User-Provided Noisy Labels for Fashion Understanding

Huang

Hsu

2021

IEEE Access

View full text Add to dashboard Cite

There is increased interest in using street photos to understand fashion trends. Though street photos usually contain rich clothing information, there are several technical challenges to their analysis. First, street photos collected from social media sites often contain user-provided noisy labels, and training models using these labels may deteriorate prediction performance. Second, most existing methods predict multiple clothing attributes individually and do not consider the potential to share knowledge between related tasks. In addition to these technical challenges, most fashion image datasets created by previous studies focus on American and European fashion styles. To address these technical challenges and understand fashion trends in Asia, we created RichWear, a new street fashion dataset containing 322,198 images with various text labels for fashion analysis. This dataset, collected from an Asian social network site, focuses on street styles in Japan and other Asian areas. RichWear provides a subset of expert-verified labels in addition to user-provided noisy labels for model training and evaluation. We propose the Fashion Attributes Recognition Network (FARNet) based on the multi-task learning framework to improve fashion recognition. Instead of predicting each clothing attribute individually, FARNet predicts three types of attributes simultaneously, and, once trained, this network leverages the noisy labels and generates corrected labels based on the input images. Experimental results show that this approach significantly outperforms existing methods. Applying the trained model to the RichWear dataset, we report Asian fashion trends and street styles based on predicted labels and image clusters from latent feature vectors.

show abstract

“…Designers and users put forward their requirements through images and text, search for related product images from databases or e-commerce websites, and the matched images will be recommended to designers and users as design references. The retrieval input can be text, images, or both of them [162,163,164,165,166]. For product, the input image provided by designers and users may be taken by their phone on the street or in a store, which is quite different from image databases and e-commerce websites in terms of shooting angle, condition, background, or posture [167,168,169,170,171].…”

Section: Product Design Based On Image Datamentioning

confidence: 99%

Big Data driven Product Design: A Survey

Quan¹,

Li²,

Zeng³

et al. 2021

Preprint

View full text Add to dashboard Cite

With the improvement of living standards, user requirements of modern products are becoming increasingly more diversified and personalized. Traditional product design methods can no longer satisfy the market needs due to their strong subjectivity, small survey scope, poor real-time data, and lack of visual display, which calls for the development of big data driven product design methodology. Big data in the product lifecycle contains valuable information for guiding product design, such as customer preferences, market demands, product evaluation, and visual display: online product reviews reflect customer evaluations and requirements; product images contain information of shape,color, and texture which can inspire designers to get initial design schemes more quickly or even directly generate new product images. How to efficiently collect product design related data and exploit them effectively during the whole product design process is thus critical to modern product design. This paper aims to conduct a comprehensive survey on big data driven product design. It will help researchers and practitioners to comprehend the latest development of relevant studies and applications centered on how big data can be processed, analyzed, and exploited in aiding product design. We first introduce several representative traditional product design methods and highlight their limitations. Then we discuss current and potential applications of textual data, image data, audio data, and video data in product design cycles. Finally, major deficiencies of existing data driven product design studies and future research directions are summarized. We believe that this study can draw increasing attention to modern data driven product design.Keywords product design • big data • Kansei engineering • QFD • generative design * Correspondence.

show abstract

DeepStyle: Multimodal Search Engine for Fashion and Interior Design

Cited by 53 publications

References 40 publications

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

From Street Photos to Fashion Trends: Leveraging User-Provided Noisy Labels for Fashion Understanding

Big Data driven Product Design: A Survey

Contact Info

Product

Resources

About