Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images

Cheng, Zhi-Qi; Wu, Xiao; Liu, Yang; Hua, Xian-Sheng

doi:10.1109/cvpr.2017.444

Cited by 72 publications

(48 citation statements)

References 30 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Street2Shop [25] DARN [22] DeepFashion [ [22,25,34,24,44,9,8,58]. These methods usually follow a global similarity computation and matching pipeline, i.e.…”

Section: Datasetsmentioning

confidence: 99%

“…There also exist some variants, such as dialog based clothes search [17] , video based clothes retrieval [8], and attribute feedback based clothes retrieval [18,59]. Their application scenarios and settings are different from ours.…”

Section: Datasetsmentioning

confidence: 99%

“…More importantly, garments may differ in small local regions such as logos only. The task of customer-to-shop clothes retrieval has great progresses [22,25,34,24,44,9,13,8,58] by using convolutional neural networks (CNNs) [28,19,14,21,39]. Existing methods often employed the global similarity pipeline.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Kuang¹,

Gao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Matching clothing images from customers and online shopping stores has rich applications in E-commerce. Existing algorithms encoded an image as a global feature vector and performed retrieval with the global representation. However, discriminative local information on clothes are submerged in this global representation, resulting in suboptimal performance. To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales. The similarity pyramid is represented by a Graph of similarity, where nodes represent similarities between clothing components at different scales, and the final matching score is obtained by message passing along edges. In GRNet, graph reasoning is solved by training a graph convolutional network, enabling to align salient clothing components to improve clothing retrieval. To facilitate future researches, we introduce a new benchmark FindFashion, containing rich annotations of bounding boxes, views, occlusions, and cropping. Extensive experiments show that GRNet obtains new state-of-the-art results on two challenging benchmarks, e.g. pushing the top-1, top-20, and top-50 accuracies on DeepFashion to 26%, 64%, and 75% (i.e. 4%, 10%, and 10% absolute improvements), outperforming competitors with large margins. On FindFashion, GRNet achieves considerable improvements on all empirical settings.

show abstract

“…Street2Shop [25] DARN [22] DeepFashion [ [22,25,34,24,44,9,8,58]. These methods usually follow a global similarity computation and matching pipeline, i.e.…”

Section: Datasetsmentioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Kuang¹,

Gao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…[43,26]. Another line of work considers retrieving fashion images based on various forms of queries, including images [26,35,52], attributes [8,1], occasions [24], videos [6], and user preferences [16]. Our work is closer to the 'cross-scenario' fashion retrieval setting (called street2shop) which seeks to retrieve fashion products appearing in street photos [25,17], as the same type of data can be adapted to our setting.…”

Section: Related Workmentioning

confidence: 99%

Complete the Look: Scene-Based Complementary Product Recommendation

Kang

Kim²,

Leskovec

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Modeling fashion compatibility is challenging due to its complexity and subjectivity. Existing work focuses on predicting compatibility between product images (e.g. an image containing a t-shirt and an image containing a pair of jeans). However, these approaches ignore real-world 'scene' images (e.g. selfies); such images are hard to deal with due to their complexity, clutter, variations in lighting and pose (etc.) but on the other hand could potentially provide key context (e.g. the user's body type, or the season) for making more accurate recommendations. In this work, we propose a new task called 'Complete the Look', which seeks to recommend visually compatible products based on scene images. We design an approach to extract training data for this task, and propose a novel way to learn the scene-product compatibility from fashion or interior design images. Our approach measures compatibility both globally and locally via CNNs and attention mechanisms. Extensive experiments show that our method achieves significant performance gains over alternative systems. Human evaluation and qualitative analysis are also conducted to further understand model behavior. We hope this work could lead to useful applications which link large corpora of real-world scenes with shoppable products.

show abstract

“…We use convolutional layers with one output channel to reduce the feature dimension. Since training images have different size and inspired by the previous work [14], one spatial pyramid pooling (SPP) layer is applied to reshape the features from the last convolutional layer into a fixed dimension. Finally, two fully connected layers are employed as a classifier.…”

Section: Network Architecturesmentioning

confidence: 99%

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

Cheng

Dai

et al. 2019

Proceedings of the 27th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multicolumn Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

show abstract

Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images

Cited by 72 publications

References 30 publications

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Complete the Look: Scene-Based Complementary Product Recommendation

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

Contact Info

Product

Resources

About