Atrous Pyramid Transformer with Spectral Convolution for Image Inpainting

Huang, Muqi; Zhang, Lefei

doi:10.1145/3503161.3548348

Cited by 7 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first transformer stacks self-attention layers and outperforms the best result of that time (Vaswani et al 2017). The idea of transformers inspires researchers to develop more transformer architectures (Huang and Zhang 2022). For example, a newly proposed model, DeiT III (Touvron, Cord, and Jégou 2022), is a variant of ViT that incorporates a new data enhancement procedure that includes Gaussian blur, solarization, and grayscale, and it achieves a competitive performance in image classification.…”

Section: Related Work Deep Learning Based Methodsmentioning

confidence: 99%

Hear You Say You: An Efficient Framework for Marine Mammal Sounds’ Classification

Liu,

et al. 2024

AAAI

View full text Add to dashboard Cite

Marine mammals and their ecosystem face significant threats from, for example, military active sonar and marine transportation. To mitigate this harm, early detection and classification of marine mammals are essential. While recent efforts have utilized spectrogram analysis and machine learning techniques, there remain challenges in their efficiency. Therefore, we propose a novel knowledge distillation framework, named XCFSMN, for this problem. We construct a teacher model that fuses the features extracted from an X-vector extractor, a DenseNet and Cross-Covariance attended compact Feed-Forward Sequential Memory Network (cFSMN). The teacher model transfers knowledge to a simpler cFSMN model through a temperature-cooling strategy for efficient learning. Compared to multiple convolutional neural network backbones and transformers, the proposed framework achieves state-of-the-art efficiency and performance. The improved model size is approximately 20 times smaller and the inference time can be 10 times shorter without affecting the model’s accuracy.

show abstract

Section: Related Work Deep Learning Based Methodsmentioning

confidence: 99%

Hear You Say You: An Efficient Framework for Marine Mammal Sounds’ Classification

Liu,

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…Ren et al [23] used smoothed images without edges to train a structure reconstructor, which generated the structures of the missing areas and then a texture generator employed the reconstructed structures with an appearance flow to generate the final restored images. Huang et al [24] designed a two-stage approach based on a novel atrous pyramid transformer (APT) for image inpainting. The inpainting method first uses several layers of APT blocks to restore the semantic structures of images and then a dual spectral transform convolutional (DSTC) module is applied to work together with the APT to infer the textural details of damaged areas.…”

Section: Multistage Image Inpaintingmentioning

confidence: 99%

Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting

Li,

Xu,

Chen

2024

Electronics

View full text Add to dashboard Cite

Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures or textural distortion in the cases of complex textural details or large damaged areas. To restore textures at a fine-grained level, we propose an image inpainting method based on a hierarchical VQ-VAE with a vector credibility mechanism. It first trains the hierarchical VQ-VAE with ground truth images to update two codebooks and to obtain two corresponding vector collections containing information on ground truth images. The two vector collections are fed to a decoder to generate the corresponding high-fidelity outputs. An encoder then is trained with the corresponding damaged image. It generates vector collections approximating the ground truth by the help of the prior knowledge provided by the codebooks. After that, the two vector collections pass through the decoder from the hierarchical VQ-VAE to produce the inpainted results. In addition, we apply a vector credibility mechanism to promote vector collections from damaged images and approximate vector collections from ground truth images. To further improve the inpainting result, we apply a refinement network, which uses residual blocks with different dilation rates to acquire both global information and local textural details. Extensive experiments conducted on several datasets demonstrate that our method outperforms the state-of-the-art ones.

show abstract

DF3Net: Dual frequency feature fusion network with hierarchical transformer for image inpainting

Huang,

Yu,

Zhang

2024

Information Fusion

View full text Add to dashboard Cite

Atrous Pyramid Transformer with Spectral Convolution for Image Inpainting

Cited by 7 publications

References 24 publications

Hear You Say You: An Efficient Framework for Marine Mammal Sounds’ Classification

Hear You Say You: An Efficient Framework for Marine Mammal Sounds’ Classification

Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting

DF3Net: Dual frequency feature fusion network with hierarchical transformer for image inpainting

Contact Info

Product

Resources

About