Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

He, Xiangteng; Zhao, Junjie

doi:10.1007/s11263-019-01176-2

Cited by 72 publications

(33 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The object/part regions in an image are determined using existing object/part detectors, while the regions are described by deep learned features. Typical methods apply Part R-CNN [8], HSnet Search [9], NTS-Net [10], Spatial Relation [11], DCL [12] and FDR [13]. Although these methods effectively improve the accuracy, the detection of regions is generally computationally expensive.…”

Section: A Regional Feature-based Methodsmentioning

confidence: 99%

Multi-Level Metric Learning Network for Fine-Grained Classification

Wang

Zhang

et al. 2019

IEEE Access

View full text Add to dashboard Cite

The application of fine-grained image classification can be problematic due to subtle differences between classes. The existing global feature-based methods have worse accuracies than regional feature-based methods, because regional feature-based methods focus on the determination of differentiated features within local regions. To learn more discriminative global features, in this paper, we proposed the use of L2 normalization to tackle a neglected conflict between the widely used metric loss (triplet loss) and classification loss (softmax loss) in global feature-based methods. Furthermore, a multi-level metric learning network (MMLN) is proposed for fine-grained image classification based on global features. In the MMLN, multi-level metric learning objectives and classification objectives are present at multiple high-level layers. The multi-level metric learning objectives work together to supervise the network in order to learn highly discriminative features. In addition, a new probability aggregation strategy (PAS) is proposed to produce a fused prediction by combining the multi-level predictive probabilities. Experiments were conducted on three standard fine-grained classification datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft). Results demonstrated that our MMLN achieved accuracies of 88.0%, 94.6% and 92.4% respectively and outperformed state-of-the-art methods, substantially improving fine-grained classification tasks. Besides, gradient-weighted class activation mapping (Grad-CAM) shows that the MMLN is able to pay more attention to the discriminative local regions due to the application of multi-level metric learning.

show abstract

Section: A Regional Feature-based Methodsmentioning

confidence: 99%

Multi-Level Metric Learning Network for Fine-Grained Classification

Wang

Zhang

et al. 2019

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Fine-grained visual classification essentially focuses on representing visual differences between subcategories [48], [49]. The vast majority of researchers follow either a localizationclassification manner or an end-to-end encoding fashion.…”

Section: Related Work a Fine-grained Visual Classificationmentioning

confidence: 99%

Exploiting Category Similarity-Based Distributed Labeling for Fine-Grained Visual Classification

Sun

Yao

et al. 2020

IEEE Access

View full text Add to dashboard Cite

The fine-grained visual classification (FGVC) which aims to distinguish subtle differences among subcategories is an important computer vision task. However, one issue that limits model performance is the problem of diversity within subcategories. To this end, we propose a simple yet effective approach named category similarity-based distributed labeling (CSDL) to tackle this problem. Specifically, we first obtain the feature centers for various subcategories and utilize them to initialize the label distributions. Then we replace the ground-truth labels in a Deep Neural Network (DNN) with the distributed labels to calculate the loss and perform the optimization. Finally, the joint supervision of a softmax loss and a center loss is adopted to update the parameters of the DNN, the deep feature centers, and the distributed labels for learning discriminative deep features. Comprehensive experiments on three publicly available FGVC datasets demonstrate the superiority of our proposed approach.

show abstract

“…CUB-200-2011 [8] is the most widely-used fine-grained image classification [10,11] dataset, including 11,788 images of 200 subcategories belonging to the same basic-level coarse-grained category of "Bird". It is divided as follows: training set contains 5,994 images, and testing set contains 5,794 images.…”

Section: Collectionmentioning

confidence: 99%

A New Benchmark and Approach for Fine-grained Cross-media Retrieval

Peng

Xie

2019

Proceedings of the 27th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Cross-media retrieval is to return the results of various media types corresponding to the query of any media type. Existing researches generally focus on coarse-grained cross-media retrieval. When users submit an image of "Slaty-backed Gull" as a query, coarsegrained cross-media retrieval treats it as "Bird", so that users can only get the results of "Bird", which may include other bird species with similar appearance (image and video), descriptions (text) or sounds (audio), such as "Herring Gull". Such coarse-grained crossmedia retrieval is not consistent with human lifestyle, where we generally have the fine-grained requirement of returning the exactly relevant results of "Slaty-backed Gull" instead of "Herring Gull". However, few researches focus on fine-grained cross-media retrieval, which is a highly challenging and practical task. Therefore, in this paper, we first construct a new benchmark for fine-grained cross-media retrieval, which consists of 200 fine-grained subcategories of the "Bird", and contains 4 media types, including image, text, video and audio. To the best of our knowledge, it is the first benchmark with 4 media types for fine-grained cross-media retrieval. Then, we propose a uniform deep model, namely FGCross-Net, which simultaneously learns 4 types of media without discriminative treatments. We jointly consider three constraints for better common representation learning: classification constraint ensures the learning of discriminative features for fine-grained subcategories, center constraint ensures the compactness characteristic of the features of the same subcategory, and ranking constraint ensures the sparsity characteristic of the features of different subcategories. Extensive experiments verify the usefulness of the new benchmark and the effectiveness of our FGCrossNet. The new benchmark and the source code of FGCrossNet will be made available at https://github.com/PKU-ICST-MIPL/FGCrossNet_ACMMM2019. CCS CONCEPTS• Information systems → Multimedia and multimodal retrieval; • Computing methodologies → Artificial intelligence.

show abstract

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

Cited by 72 publications

References 56 publications

Multi-Level Metric Learning Network for Fine-Grained Classification

Multi-Level Metric Learning Network for Fine-Grained Classification

Exploiting Category Similarity-Based Distributed Labeling for Fine-Grained Visual Classification

A New Benchmark and Approach for Fine-grained Cross-media Retrieval

Contact Info

Product

Resources

About