Image Retrieval Using Convolutional Autoencoder, InfoGAN, and Vision Transformer Unsupervised Models

Sabry, Eman S.; Elagooz, Salah S.; El‐Samie, Fathi E. Abd; El‐Shafai, Walid; El‐Bahnasawy, Nirmeen A.; Banby, Ghada El; Algarni, Abeer D.; Soliman, Naglaa F.; Ramadan, Rabie A.

doi:10.1109/access.2023.3241858

Cited by 12 publications

(9 citation statements)

References 43 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Database images which are out of interval are ignored which causes the model's speed up. Recent CBIR systems use convolutional layers as deep learning architecture for feature extraction [2], [5], [8], [13]. Some of The most Recent CBIR systems use Transformers for feature extraction that has led to satisfactory results [3], [4], [6], [7], [8].…”

Section: Introductionmentioning

confidence: 99%

“…Recent CBIR systems use convolutional layers as deep learning architecture for feature extraction [2], [5], [8], [13]. Some of The most Recent CBIR systems use Transformers for feature extraction that has led to satisfactory results [3], [4], [6], [7], [8]. Transformers have made a big evolution in AI even in image processing.…”

Section: Introductionmentioning

confidence: 99%

“…Transformers have made a big evolution in AI even in image processing. [4], [6], [7] and [8] used Vision Transformer as a feature extractor.…”

Section: Introductionmentioning

confidence: 99%

“…[6] Used Vison Transformer with metric learning objective. [8] Used Vision Transformer for Sketched-Real Image Retrieval (SRIR) beside Info-GAN on ESRIR dataset. I used BEIT Transformer that gained better top1-accuracy than Vision Transformer in ImageNet dataset for classification task [15].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Delver Hamed : Content Based Image Retrieval Using BEIT Transformer and Detic

Jelveh

2024

Preprint

View full text Add to dashboard Cite

I present DHam, a new and exact unsupervised learning model for Content Based Image Retrieval (CBIR) that does not need any training data set. DHam is accurate especially when you deal with the multiple objects image with background (MOIB). This is the first time that pre-trained Detic and pre-trained of a self-supervised based image transformer (SSIT) BEIT, have been mixed for CBIR. First, I use pre-trained Detic to detect image objects. Then I extract every object's feature with pre-trained BEIT. DHam shows its superiority when search is accomplished amongst multiple objects images with background (MOIBs). Besides, DHam Test results are compared with pure BEIT and pure ResNet CBIR models. On the other hand, it is not a fast model. It takes around 19 and 273 seconds to compare the input image with 44,891 and 1,868,672 features respectively. Compared to state of the art CBIR systems, DHam may bring irrelevant images but is less likely to miss the target similar image.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Transformers have made a big evolution in AI even in image processing. [4], [6], [7] and [8] used Vision Transformer as a feature extractor.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Delver Hamed : Content Based Image Retrieval Using BEIT Transformer and Detic

Jelveh

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…smart phones and tablets. With their wide uses, a great number of digital photos have been taken by people anytime and anywhere, which are usually delivered to and then stored in cloud servers [1], [2], [3], [4], [5]. For any image in the enormous amount of photos, there may be some other images with similar contents, which thus can be called similar images.…”

Section: Introductionmentioning

confidence: 99%

Image Insertion Using Depth- and Topology- Constrained Minimum Spanning Tree for Compressed Image Sets

Sha,

Wu,

Huang

2024

IEEE Access

View full text Add to dashboard Cite

As an important paradigm of image set management, image insertion refers to adding new photos to an existing compressed image set. Recently, several algorithms have made significant progress in image insertion. However, due to complexity image relationships, coding performance still remains to rise. To address the issue, in this paper a high-coding-efficiency image insertion algorithm is proposed for compressed image sets. In order to maximize coding performance, it is necessary for each new picture to thoroughly exploit the correlations between it and all the other images. To be specific, in our proposed approach images are first divided into two kinds: to-be-inserted images and compressed images, where the former includes new photographs and the latter composes that compressed image set. Second, a depth-and topology-constrained minimum spanning tree (DTCMST) heuristics is also proposed, to fully investigate the relationships not only between to-be-inserted images and compressed images but among different to-beinserted images. With the generated DTCMST, the depth requirement and the topology structure of the existing compressed image set can be satisfied and kept unchanged, respectively. Finally, after the encoding of every to-be-inserted image by using its assigned parent vertex from the DTCMST as its prediction reference, a new compressed image is eventually established. Compared with state-of-the-art methods, experimental results show that average bit rate saving and Bjøntegaard delta peak signal-to-noise ratio enhancement are achieved up to 5.1 % and 0.41 dB by using our proposed algorithm, respectively, with the similar computational complexity.INDEX TERMS Image insertion, compressed image set, minimum spanning tree, coding efficiency, image set management.

show abstract

Generative Adversarial Network for Synthetic Image Generation Method: Review, Analysis, and Perspective

Dewi

2024

Applications of Generative AI

View full text Add to dashboard Cite

Image Retrieval Using Convolutional Autoencoder, InfoGAN, and Vision Transformer Unsupervised Models

Cited by 12 publications

References 43 publications

Delver Hamed : Content Based Image Retrieval Using BEIT Transformer and Detic

Delver Hamed : Content Based Image Retrieval Using BEIT Transformer and Detic

Image Insertion Using Depth- and Topology- Constrained Minimum Spanning Tree for Compressed Image Sets

Generative Adversarial Network for Synthetic Image Generation Method: Review, Analysis, and Perspective

Contact Info

Product

Resources

About