TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Chen, Zihang; Wang, Junjue; Ma, Ailong; Zhong, Yanfei

doi:10.1109/lgrs.2022.3192062

Cited by 16 publications

(8 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Firstly, we conduct the performance experiments of the accounting feature extraction method based on multimodal information embedding on the Synthetic Financial Dataset. We selected CNN ( Kattenborn et al, 2021 ), LSTM ( Yu et al, 2019 ), Transformer ( Vaswani et al, 2017 ), BERT ( Deepa, 2021 ) and TypeFormer ( Chen et al, 2022 ) to compare the performance in Fig. 6 and Table 3 .…”

Section: Experiments and Analysismentioning

confidence: 99%

“…Then, we test the performance of multi-objective parameter selection based on a parallel genetic algorithm on the dataset. MPOS will still be compared with CNN ( Kattenborn et al, 2021 ), LSTM ( Yu et al, 2019 ), Transformer ( Vaswani et al, 2017 ), BERT ( Deepa, 2021 ), and TypeFormer ( Chen et al, 2022 ). In this experiment, the model will be evaluated in terms of accuracy, number of parameters, and elapsed time.…”

Section: Experiments and Analysismentioning

confidence: 99%

See 1 more Smart Citation

Real-time task parameter selection method of accounting system based on multi-objective optimization and genetic algorithm

Qin,

Shahbaz

2024

PeerJ Computer Science

View full text Add to dashboard Cite

The progress of the digital economy has promoted the enterprise accounting system. To accelerate the update and evolution of accounting systems, we propose a parameter selection method based on multi-objective optimization and genetic algorithm. Firstly, this article proposes an accounting feature extraction method based on multimodal information embedding. The dual-branch structure and feature pyramid network are used to realize the feature extraction of the information involved in accounting. Then, this article proposes a multi-objective parameter selection method based on a parallel genetic algorithm. By embedding a genetic algorithm in the process of dual-branch model training, the model’s ability to sense accounting information is improved. Finally, using the above two methods, an accounting system evaluation method upon recurrent Transformer is proposed to improve the financial situation of enterprises. Our experiments have proven that our approach attains a remarkable performance with an 87.6% F-value, 83.5% mAP value, and 83.4% accuracy. These results position our method at an advanced level globally, showcasing its capability to enhance accounting systems.

show abstract

Section: Experiments and Analysismentioning

confidence: 99%

Section: Experiments and Analysismentioning

confidence: 99%

Real-time task parameter selection method of accounting system based on multi-objective optimization and genetic algorithm

Qin,

Shahbaz

2024

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…Transformer has been successfully applied to application research of remote sensing images, providing a new idea to solve the problems of insufficient robustness faced by subjectsensitive hashing. Chen et al [33] employed a pure multiscale transformer for captioning of remote sensing image, which can effectively generate specific types of captions. Zhang et al [34] build dual stream network (DTHNet) based on transformer for shadow extraction of remote sensing images.…”

Section: Transformersmentioning

confidence: 99%

SDTU-Net: Stepwise-Drop and Transformer-Based U-Net for Subject-Sensitive Hashing of HRRS Images

Ding,

Chen,

Zeng

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

As a new integrity authentication technology, subject-sensitive hashing has the ability to achieve subjectsensitive authentication for high-resolution remote sensing (HRRS) images and can provide a security guarantee for their subsequent use. However, existing research on subject-sensitive hashing focuses on improving the structure of the deep neural network of the algorithm to improve the algorithm's performance, which makes it necessary to reconstruct the training dataset or modify the network structure in the face of different integrity authentication requirements. In this paper, we delve into the impact of dropout on subject-sensitive hashing and propose a stepwise-drop mechanism to address the robustness and tampering-sensitivity requirements of subject-sensitive hashing. On this basis, a network named Stepwise-drop and Transformer based U-net (SDTU-net) is proposed for subject-sensitive hashing of HRRS images. SDTU-net can use our proposed stepwise-drop mechanism to determine the drop rate of different network layers, which makes it possible to adjust the algorithm performance without changing network structure and training data. Experiments show that our SDTU-net based subject-sensitive hashing has better overall performance compared with existing algorithms, especially at medium and low thresholds. Our approach solves the problem that the existing algorithms cannot balance robustness and tamper sensitivity at low thresholds.

show abstract

“…Language integration in RS has showcased impressive capabilities across various tasks, including image captioning [2,[17][18][19][20][21][22][23][24][25][26][27][28], VQA [3,[29][30][31][32], and text-image retrieval [4]. A comprehensive review of NLP applications in RS can be found at [1].…”

Section: Nlp In Remote Sensingmentioning

confidence: 99%

“…In [26], multi-scale visual features are extracted by a CNN, which are decoded using a language transformer. Another proposed approach incorporates the caption type into the caption features within an encoder-decoder based on the transformer, enabling the generation of more controlled captions [27]. In [28], visual features extracted by a CNN are fed into a transformer encoder-decoder trained with a self-critical sequence strategy.…”

Section: Nlp In Remote Sensingmentioning

confidence: 99%

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

Bazi,

Bashmal,

Al Rahhal

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis.

show abstract

TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Cited by 16 publications

References 19 publications

Real-time task parameter selection method of accounting system based on multi-objective optimization and genetic algorithm

Real-time task parameter selection method of accounting system based on multi-objective optimization and genetic algorithm

SDTU-Net: Stepwise-Drop and Transformer-Based U-Net for Subject-Sensitive Hashing of HRRS Images

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

Contact Info

Product

Resources

About