2022
DOI: 10.1109/access.2022.3163256
|View full text |Cite
|
Sign up to set email alerts
|

Improving Bag-of-Deep-Visual-Words Model via Combining Deep Features With Feature Difference Vectors

Abstract: Bag-of-Deep-Visual-Words (BoDVW) model has shown its advantage to Convolutional Neural Network (CNN) model in image classification tasks with a small number of training samples. An essential step in BoDVW model is to extract deep features by using an off-the-shelf CNN model as a feature extractor. Two deep feature extraction methods have been raised in recent years. The first method densely samples multi-scale image patches and then converts them into deep features via a deep-level fullyconnected layer.The sec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 34 publications
0
1
0
Order By: Relevance
“…The first winner, AlexNet [ 16 ], expanded the input image size from 32 × 32 in LeNet to 224 × 224, increasing the model size, but solved the potential problem of overfitting by applying dropout layers and significantly improved its accuracy from 73.8% to 83.7% by applying the rectified linear unit (ReLU) activation function to the ImageNet tests. VggNet [ 17 ] achieved remarkable results with an accuracy of 93.2% by increasing the number of convolutional filters and expanding the layer structure while unifying the convolution filter size to 3 × 3 to reduce computation. However, there was no significant improvement compared to having 16 layers.…”
Section: Methodsmentioning
confidence: 99%
“…The first winner, AlexNet [ 16 ], expanded the input image size from 32 × 32 in LeNet to 224 × 224, increasing the model size, but solved the potential problem of overfitting by applying dropout layers and significantly improved its accuracy from 73.8% to 83.7% by applying the rectified linear unit (ReLU) activation function to the ImageNet tests. VggNet [ 17 ] achieved remarkable results with an accuracy of 93.2% by increasing the number of convolutional filters and expanding the layer structure while unifying the convolution filter size to 3 × 3 to reduce computation. However, there was no significant improvement compared to having 16 layers.…”
Section: Methodsmentioning
confidence: 99%
“…The other (denoted as Ext-DFs(FC)) extracts multi-scale image patches and uses the output vectors from a fully-connected or global pooling layer as deep features. A recent study [ 41 ] utilized these two methods, but it did not provide experimental results under different dictionary sizes and encoding methods, and also did not fully analyze the computational costs. In this article, we take a closer look at these two methods.…”
Section: Introductionmentioning
confidence: 99%