Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Chen, Tianshui; Wu, Wenxi; Gao, Yuefang; Dong, Le; Luo, Xiangfeng; Li, Lin

doi:10.1145/3240508.3240523

Cited by 84 publications

(50 citation statements)

References 42 publications

(98 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Table 3, we can see that SCBA achieves better result compared with others weakly supervised approaches, even some supervised methods, e.g., Mask-CNN [44] and HSnet [45], which proves the validity of our model. Compared with HSE [18] which used hierarchical semantic embedding to learn stronger representation of fine-grained feature, we improve relative accuracy with 0.16% by our SCBA. We even surpass TASN [46] and DCL [47] which were the state-of-the-art weakly supervised models recently proposed, with 0.36% and 0.46% relative accuracy improves, respectively.…”

Section: ) Results On Cub-200-2011mentioning

confidence: 99%

“…Generally speaking, the aims of fine-grained classification methods are high accuracy and small computational cost. In order to get better performance, Chen et al [18] exploited semantic guided attention mechanism to learn more discriminative regions at each level by incorporating the predicted score vector of the higher level as prior knowledge. Wang et al [19] proposed to add supervision information to filters for optimizing discriminative part detectors and further localizing the key regions.…”

Section: A Weakly-supervised Fine-grained Image Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

Self-Layer and Cross-Layer Bilinear Aggregation for Fine-Grained Recognition in Cyber-Physical-Social Systems

et al. 2020

View full text Add to dashboard Cite

Cyber-Physical-Social Systems (CPSS) integrates cyber, physical and social spaces together, which makes our lives more convenient and intelligent by providing personalized service. In this paper, we will provide CPSS service for fine-grained recognition. Fine-grained visual recognition is a hot but challenging research in computer vision that aims to recognize object subcategories. The reason why it is challenging is that it extremely depends on the subtle discriminative features of local parts. Recently, some bilinear feature based methods were proposed, and the experimental results show state-of-the-art performance. However, most of them neglect the spatial relationships of part-region feature among multiple layers. In this paper, a novel approach of Self-layer and Cross-layer Bilinear Aggregation(SCBA) is proposed for fine-grained recognition. Firstly, a self-layer bilinear feature fusion module is proposed to model the spatial relationship of feature at the same layer. Secondly, we propose a cross-layer bilinear feature fusion module to capture the inter-layer interreaction of information to boost the ability of feature representation. In summary, the method we proposed not only can learn the correlations among different layers but the same layer, which makes it efficient and the experimental results show that it achieves state-of-the-art accuracy on three common fine-grained image datasets.

show abstract

Section: ) Results On Cub-200-2011mentioning

confidence: 99%

Section: A Weakly-supervised Fine-grained Image Recognitionmentioning

confidence: 99%

Self-Layer and Cross-Layer Bilinear Aggregation for Fine-Grained Recognition in Cyber-Physical-Social Systems

et al. 2020

View full text Add to dashboard Cite

show abstract

“…• Coare-to-fine CNN: Fu et al [20] propose the coarseto-fine layer by applying Bayesian techniques into the network, making the coarse-to-fine network can learning the hierarchical category tree. • Tree-CNN: Roy et al [24] propose an adaptive hierarchical network structure composed of DCNNs that can grow and learn as new data becomes available.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Hierarchical bilinear convolutional neural network for image classification

Zhang

Tang

Luo

et al. 2021

IET Computer Vision

View full text Add to dashboard Cite

Image classification is one of the mainstream tasks of computer vision. However, the most existing methods use labels of the same granularity level for training. This leads to ignoring the hierarchy that may help to differentiate different visual objects better. Embedding hierarchical information into the convolutional neural networks (CNNs) can effectively regulate the semantic space and thus reduce the ambiguity of prediction. To this end, a multi-task learning framework, named as Hierarchical Bilinear Convolutional Neural Network (HB-CNN), is developed by seamlessly integrating CNNs with multitask learning over the hierarchical visual concept structures. Specifically, the labels with a tree structure are used as the supervision to hierarchically train multiple branch networks. In this way, the model can not only learn additional information (e.g. context information) as the coarse-level category features, but also focus the learned fine-level category features on the object properties. To smoothly pass hierarchical conceptual information and encourage feature reuse, a connectivity pattern is proposed to connect features at different levels. Furthermore, a bilinear module is embedded to generalise various orderless texture feature descriptors so that our model can capture more discriminative features. The proposed method is extensively evaluated on the CIFAR-10, CIFAR-100, and 'Orchid' Plant image sets. The experimental results show the effectiveness and superiority of our method. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

show abstract

“…Recently, knowledge distillation [19], which trains a small student network to acquire the knowledge of a complex teacher network, has become a desirable alternative for model compression due to its broad applicability areas. Numerous works [9,42,46,49,69,75] Table 1: The Root Mean Squared Error (RMSE), Parameters, FLOPs, and inference time of our SKT network and four state-of-the-art models on the UCF-QNRF [22] dataset. The FLOPs and parameters are computed with the input size of 2032×2912, and the inference times are measured on a Intel Xeon E5 CPU (2.4G) and a single Nvidia GTX 1080 GPU.…”

Section: Introductionmentioning

confidence: 99%

Efficient Crowd Counting via Structured Knowledge Transfer

Liu

Chen

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive run-time consumption, which would seriously restrict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework, which fully exploits the structured knowledge of a well-trained teacher network to generate a lightweight but still highly effective student network. Specifically, it is integrated with two complementary transfer modules, including an Intra-Layer Pattern Transfer which sequentially distills the knowledge embedded in layer-wise features of the teacher network to guide feature learning of the student network and an Inter-Layer Relation Transfer which densely distills the cross-layer correlation knowledge of the teacher to regularize the student's feature evolution. Consequently, our student network can derive the layer-wise and cross-layer knowledge from the teacher network to learn compact yet effective features. Extensive evaluations on three benchmarks well demonstrate the effectiveness of our SKT for extensive crowd counting models. In particular, only using around 6% of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5× speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance. Our code and models are available at https://github.com/HCPLab-SYSU/SKT.

show abstract

Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Cited by 84 publications

References 42 publications

Self-Layer and Cross-Layer Bilinear Aggregation for Fine-Grained Recognition in Cyber-Physical-Social Systems

Self-Layer and Cross-Layer Bilinear Aggregation for Fine-Grained Recognition in Cyber-Physical-Social Systems

Hierarchical bilinear convolutional neural network for image classification

Efficient Crowd Counting via Structured Knowledge Transfer

Contact Info

Product

Resources

About