Improving Bag-of-Deep-Visual-Words Model via Combining Deep Features With Feature Difference Vectors

Wang, Xiangshi

doi:10.1109/access.2022.3163256

Cited by 5 publications

(2 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first winner, AlexNet [ 16 ], expanded the input image size from 32 × 32 in LeNet to 224 × 224, increasing the model size, but solved the potential problem of overfitting by applying dropout layers and significantly improved its accuracy from 73.8% to 83.7% by applying the rectified linear unit (ReLU) activation function to the ImageNet tests. VggNet [ 17 ] achieved remarkable results with an accuracy of 93.2% by increasing the number of convolutional filters and expanding the layer structure while unifying the convolution filter size to 3 × 3 to reduce computation. However, there was no significant improvement compared to having 16 layers.…”

Section: Methodsmentioning

confidence: 99%

Comparison of estimating vegetation index for outdoor free-range pig production using convolutional neural networks

OH,

Park,

Park

2023

J Anim Sci Technol

View full text Add to dashboard Cite

show abstract

Section: Methodsmentioning

confidence: 99%

Comparison of estimating vegetation index for outdoor free-range pig production using convolutional neural networks

OH,

Park,

Park

2023

J Anim Sci Technol

View full text Add to dashboard Cite

show abstract

“…The other (denoted as Ext-DFs(FC)) extracts multi-scale image patches and uses the output vectors from a fully-connected or global pooling layer as deep features. A recent study [ 41 ] utilized these two methods, but it did not provide experimental results under different dictionary sizes and encoding methods, and also did not fully analyze the computational costs. In this article, we take a closer look at these two methods.…”

Section: Introductionmentioning

confidence: 99%

Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?

Xu,

Zhang,

Huang

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

This article investigates whether higher classification accuracy can always be achieved by utilizing a pre-trained deep learning model as the feature extractor in the Bag-of-Deep-Visual-Words (BoDVW) classification model, as opposed to directly using the new classification layer of the pre-trained model for classification. Considering the multiple factors related to the feature extractor -such as model architecture, fine-tuning strategy, number of training samples, feature extraction method, and feature encoding method—we investigate these factors through experiments and then provide detailed answers to the question. In our experiments, we use five feature encoding methods: hard-voting, soft-voting, locally constrained linear coding, super vector coding, and fisher vector (FV). We also employ two popular feature extraction methods: one (denoted as Ext-DFs(CP)) uses a convolutional or non-global pooling layer, and another (denoted as Ext-DFs(FC)) uses a fully-connected or global pooling layer. Three pre-trained models—VGGNet-16, ResNext-50(32×4d), and Swin-B—are utilized as feature extractors. Experimental results on six datasets (15-Scenes, TF-Flowers, MIT Indoor-67, COVID-19 CXR, NWPU-RESISC45, and Caltech-101) reveal that compared to using the pre-trained model with only the new classification layer re-trained for classification, employing it as the feature extractor in the BoDVW model improves the accuracy in 35 out of 36 experiments when using FV. With Ext-DFs(CP), the accuracy increases by 0.13% to 8.43% (averaged at 3.11%), and with Ext-DFs(FC), it increases by 1.06% to 14.63% (averaged at 5.66%). Furthermore, when all layers of the pre-trained model are fine-tuned and used as the feature extractor, the results vary depending on the methods used. If FV and Ext-DFs(FC) are used, the accuracy increases by 0.21% to 5.65% (averaged at 1.58%) in 14 out of 18 experiments. Our results suggest that while using a pre-trained deep learning model as the feature extractor does not always improve classification accuracy, it holds great potential as an accuracy improvement technique.

show abstract

Bag-of-Visual-Words Feature Extraction using Graph Convolutional Networks

Mastour,

Farah

2024

2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP)

View full text Add to dashboard Cite

Improving Bag-of-Deep-Visual-Words Model via Combining Deep Features With Feature Difference Vectors

Cited by 5 publications

References 34 publications

Comparison of estimating vegetation index for outdoor free-range pig production using convolutional neural networks

Comparison of estimating vegetation index for outdoor free-range pig production using convolutional neural networks

Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?

Bag-of-Visual-Words Feature Extraction using Graph Convolutional Networks

Contact Info

Product

Resources

About