2017
DOI: 10.1109/tmm.2017.2690144
|View full text |Cite
|
Sign up to set email alerts
|

Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data

Abstract: Composing fashion outfits involves deep understanding of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., Jewelry, Bag, Pants, Dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
149
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 188 publications
(152 citation statements)
references
References 28 publications
3
149
0
Order By: Relevance
“…Methods belonging to the second category, such as presented in [11] and [3], are based on modeling a fashion out t as a set or an ordered sequence. Li et al deploy an end-to-end deep learning system which can classify a given out t as popular or unpopular [11]. Han et al train a bidirectional LSTM model to sequentially generate out ts [3].…”
Section: Fashion Out T Generationmentioning
confidence: 99%
See 1 more Smart Citation
“…Methods belonging to the second category, such as presented in [11] and [3], are based on modeling a fashion out t as a set or an ordered sequence. Li et al deploy an end-to-end deep learning system which can classify a given out t as popular or unpopular [11]. Han et al train a bidirectional LSTM model to sequentially generate out ts [3].…”
Section: Fashion Out T Generationmentioning
confidence: 99%
“…e concept of fashion relies mostly on visual and textual information. Most previous works suggest to leverage image and text to learn multi-modal embeddings [3,11]. In our work, we use We mask the items in the out t one at a time.…”
Section: Multi-modal Embeddingmentioning
confidence: 99%
“…For end-to-end training outfits, pooling [14] or concatenation [27] operations can be used to aggregate multiple item features. Then the multilayer perceptron (MLP) can be used to compute a compatibility score.…”
Section: Visual Compatibility Learningmentioning
confidence: 99%
“…Pooling [14] 88.35 ± 0.26 57.28 ± 0.31 Concatenation [27] 83.40 ± 0.48 52.91 ± 0.59 Self-Attention [32] 79.65 ± 0.68 48.60 ± 0.70 CSN [28] 84.90 ± 0.52 57.06 ± 1.70 BiLSTM [5] 74.44 ± 0.95 45.41 ± 0.40 BiLSTM+VSE [5] 74.82 ± 0.63 46.02 ± 0.62 Ours 91.90 ± 0.40 64.35 ± 0.92 Table 2: Outfit compatibility prediction AUC and FITB accuracy on Polyvore-T dataset.…”
Section: Comparison With Projected Embeddingmentioning
confidence: 99%
See 1 more Smart Citation