Proceedings of the 26th ACM International Conference on Multimedia 2018
DOI: 10.1145/3240508.3240523
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Abstract: Object categories inherently form a hierarchy with di erent levels of concept abstraction, especially for ne-grained categories. For example, birds (Aves) can be categorized according to a four-level hierarchy of order, family, genus, and species. This hierarchy encodes rich correlations among various categories across di erent levels, which can e ectively regularize the semantic space and thus make prediction less ambiguous. However, previous studies of negrained image recognition primarily focus on categorie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 84 publications
(50 citation statements)
references
References 42 publications
(98 reference statements)
0
49
0
Order By: Relevance
“…In Table 3, we can see that SCBA achieves better result compared with others weakly supervised approaches, even some supervised methods, e.g., Mask-CNN [44] and HSnet [45], which proves the validity of our model. Compared with HSE [18] which used hierarchical semantic embedding to learn stronger representation of fine-grained feature, we improve relative accuracy with 0.16% by our SCBA. We even surpass TASN [46] and DCL [47] which were the state-of-the-art weakly supervised models recently proposed, with 0.36% and 0.46% relative accuracy improves, respectively.…”
Section: ) Results On Cub-200-2011mentioning
confidence: 99%
See 1 more Smart Citation
“…In Table 3, we can see that SCBA achieves better result compared with others weakly supervised approaches, even some supervised methods, e.g., Mask-CNN [44] and HSnet [45], which proves the validity of our model. Compared with HSE [18] which used hierarchical semantic embedding to learn stronger representation of fine-grained feature, we improve relative accuracy with 0.16% by our SCBA. We even surpass TASN [46] and DCL [47] which were the state-of-the-art weakly supervised models recently proposed, with 0.36% and 0.46% relative accuracy improves, respectively.…”
Section: ) Results On Cub-200-2011mentioning
confidence: 99%
“…Generally speaking, the aims of fine-grained classification methods are high accuracy and small computational cost. In order to get better performance, Chen et al [18] exploited semantic guided attention mechanism to learn more discriminative regions at each level by incorporating the predicted score vector of the higher level as prior knowledge. Wang et al [19] proposed to add supervision information to filters for optimizing discriminative part detectors and further localizing the key regions.…”
Section: A Weakly-supervised Fine-grained Image Recognitionmentioning
confidence: 99%
“…• Coare-to-fine CNN: Fu et al [20] propose the coarseto-fine layer by applying Bayesian techniques into the network, making the coarse-to-fine network can learning the hierarchical category tree. • Tree-CNN: Roy et al [24] propose an adaptive hierarchical network structure composed of DCNNs that can grow and learn as new data becomes available.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…Recently, knowledge distillation [19], which trains a small student network to acquire the knowledge of a complex teacher network, has become a desirable alternative for model compression due to its broad applicability areas. Numerous works [9,42,46,49,69,75] Table 1: The Root Mean Squared Error (RMSE), Parameters, FLOPs, and inference time of our SKT network and four state-of-the-art models on the UCF-QNRF [22] dataset. The FLOPs and parameters are computed with the input size of 2032×2912, and the inference times are measured on a Intel Xeon E5 CPU (2.4G) and a single Nvidia GTX 1080 GPU.…”
Section: Introductionmentioning
confidence: 99%