Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

Yu, Kai; Leng, Biao; Zhang, Zhang; Li, Dangwei; Huang, Kaiqi

doi:10.48550/arxiv.1611.05603

Cited by 11 publications

(18 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Refer to (Li et al 2016a) for their definitions and explanations. We compare our approach with 14 existing counterparts, including HPNet (Liu et al 2017), JRL (Wang et al 2017), VeSPA (Sarfraz et al 2017), WPAL (Yu et al 2016), GAM (Fabbri, Calderara, and Cucchiara 2017), GRL (Zhao et al 2018), LGNet (Liu et al 2018), PGDM (Li et al 2018), VSGR , RCRA (Zhao et al 2019), I 2 ANet (Ji et al 2019), JLPLS (Tan et al 2019), CoCNN , and DCL (Wang et al 2019), as shown in Table 2. The samples in the RAP dataset are collected from real world surveillance scenarios, and compared to the ones in WIDER-Attribute, there are less distractions.…”

Section: Resultsmentioning

confidence: 99%

“…When the total number of concerned attributes increases, the influence of the class imbalance problem can no longer be neglected. We thus also employ the weighted BCE-loss (Yu et al 2016) as:…”

Section: Training Schemementioning

confidence: 99%

“…In the literature, the majority of existing efforts to HAR have been made on building effective features, and a large number of works focus on improving the discrimination and the robustness of representations of appearance properties. Features are evolving from handcrafted ones (Joo, Wang, and Zhu 2013;Cao et al 2008) to deep learned ones (Zhu et al 2017b;Yu et al 2016), with promising performance achieved.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism

Huang

Guo

et al. 2019

Preprint

View full text Add to dashboard Cite

Recently, Human Attribute Recognition (HAR) has become a hot topic due to its scientific challenges and application potentials, where localizing attributes is a crucial stage but not well handled. In this paper, we propose a novel deep learning approach to HAR, namely Distraction-aware HAR (Da-HAR). It enhances deep CNN feature learning by improving attribute localization through a coarse-to-fine attention mechanism. At the coarse step, a self-mask block is built to roughly discriminate and reduce distractions, while at the fine step, a masked attention branch is applied to further eliminate irrelevant regions. Thanks to this mechanism, feature learning is more accurate, especially when heavy occlusions and complex backgrounds exist. Extensive experiments are conducted on the WIDER-Attribute and RAP databases, and state-ofthe-art results are achieved, demonstrating the effectiveness of the proposed approach.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Training Schemementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism

Huang

Guo

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Note that body parts are related to semantic attributes which are often specific to different body parts. A number of attributes based re-id models [43,36,51,11] have been proposed. They use attributes to provide additional supervision for learning identity-sensitive features.…”

Section: Related Workmentioning

confidence: 99%

Pose-Normalized Image Generation for Person Re-identification

Qian

Xiang

et al. 2018

Computer Vision – ECCV 2018

378

229

View full text Add to dashboard Cite

Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and viewinvariant features in the presence of large pose variations. In this work, we address both problems by proposing a novel deep person image generation model for synthesizing realistic person images conditional on pose. The model is based on a generative adversarial network (GAN) designed specifically for pose normalization in re-id, thus termed pose-normalization GAN (PN-GAN). With the synthesized images, we can learn a new type of deep re-id feature free of the influence of pose variations. We show that this feature is strong on its own and complementary to features learned with the original images. Importantly, under the transfer learning setting, we show that our model generalizes well to any new re-id dataset without the need for collecting any training data for model fine-tuning. The model thus has the potential to make re-id model truly scalable.

show abstract

“…As discussed in [24], [25], [26], SPD matrix transformation networks are capable of achieving the better performance than the original SPD matrix. Inspired by [37] and [21], we add a learnable layer to make the network more flexible and more adaptive to the specific task. Based on the SPD matrix generated by the kernel aggregation layer, we expect to transform the existing SPD representation to be a more discriminative, suitable and desirable matrix.…”

Section: Spd Matrix Transformation Layermentioning

confidence: 99%

Learning a robust representation via a deep network on symmetric positive definite manifolds

Gao

et al. 2019

Pattern Recognition

View full text Add to dashboard Cite

Recent studies have shown that aggregating convolutional features of a pre-trained Convolutional Neural Network (CNN) can obtain impressive performance for a variety of visual tasks. The symmetric Positive Definite (SPD) matrix becomes a powerful tool due to its remarkable ability to learn an appropriate statistic representation to characterize the underlying structure of visual features. In this paper, we propose to aggregate deep convolutional features into an SPD matrix representation through the SPD generation and the SPD transformation under an end-to-end deep network. To this end, several new layers are introduced in our network, including a nonlinear kernel aggregation layer, an SPD matrix transformation layer, and a vectorization layer. The nonlinear kernel aggregation layer is employed to aggregate the convolutional features into a real SPD matrix directly. The SPD matrix transformation layer is designed to construct a more compact and discriminative SPD representation. The vectorization and normalization operations are performed in the vectorization layer for reducing the redundancy and accelerating the convergence. The SPD matrix in our network can be considered as a mid-level representation bridging convolutional features and high-level semantic features. To demonstrate the effectiveness of our method, we conduct extensive experiments on visual classification. Experiment results show that our method notably outperforms state-of-the-art methods.

show abstract

Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

Cited by 11 publications

References 18 publications

Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism

Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism

Pose-Normalized Image Generation for Person Re-identification

Learning a robust representation via a deep network on symmetric positive definite manifolds

Contact Info

Product

Resources

About