Inter- and Intra-Modal Contrastive Hybrid Learning Framework for Multimodal Abstractive Summarization

Li, Jiangfeng; Zhang, Zijian; Wang, Bowen; Zhao, Qinpei; Zhang, Chenxi

doi:10.3390/e24060764

Cited by 6 publications

(5 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Li et al [ 21 ] have introduced an Inter and Intra modal Contrastive Hybrid (ITCH) framework that uses the automatic alignment of the multimodal information and summarizes it accordingly. ITCH obtains the bi-modal input as text and image to present it in a patch-oriented encoder and textual encoder to extract the features.…”

Section: Related Workmentioning

confidence: 99%

Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

Argade,

Khairnar,

Vora

et al. 2024

Heliyon

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

Argade,

Khairnar,

Vora

et al. 2024

Heliyon

View full text Add to dashboard Cite

“…The samples in R 0 are regarded as negative examples. As such, we follow (Lin et al 2022) to define the pairwise objective function with anchor sample and positive or negative samples L 1 (x a , x a ), a ∈ {t, v}. The final fully-supervised intra-modal contrastive loss is as follows:…”

Section: Rumor Detection With Contrastive Learningmentioning

confidence: 99%

Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector

Lao,

Zhang,

Shi

et al. 2024

AAAI

View full text Add to dashboard Cite

Multimodal content, such as mixing text with images, presents significant challenges to rumor detection in social media. Existing multimodal rumor detection has focused on mixing tokens among spatial and sequential locations for unimodal representation or fusing clues of rumor veracity across modalities. However, they suffer from less discriminative unimodal representation and are vulnerable to intricate location dependencies in the time-consuming fusion of spatial and sequential tokens. This work makes the first attempt at multimodal rumor detection in the frequency domain, which efficiently transforms spatial features into the frequency spectrum and obtains highly discriminative spectrum features for multimodal representation and fusion. A novel Frequency Spectrum Representation and fUsion network (FSRU) with dual contrastive learning reveals the frequency spectrum is more effective for multimodal representation and fusion, extracting the informative components for rumor detection. FSRU involves three novel mechanisms: utilizing the Fourier transform to convert features in the spatial domain to the frequency domain, the unimodal spectrum compression, and the cross-modal spectrum co-selection module in the frequency domain. Substantial experiments show that FSRU achieves satisfactory multimodal rumor detection performance.

show abstract

“…Therefore, cross-modal CL is applied to MSA so that the distance between paired image text data and feature space is as close as possible, while the distance between nonpaired image text data and feature space is as far as possible. It is one of the future development directions in MSA to realize the semantic interaction and association of images and texts at different levels [174].…”

Section: B Future Trendsmentioning

confidence: 99%

Sentiment Analysis: Comprehensive Reviews, Recent Advances, and Open Challenges

Sun

Long

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Sentiment analysis (SA) aims to understand the attitudes and views of opinion holders with computers. Previous studies have achieved significant breakthroughs and extensive applications in the past decade, such as public opinion analysis and intelligent voice service. With the rapid development of deep learning, SA based on various modalities has become a research hotspot. However, only individual modality has been analyzed separately, lacking a systematic carding of comprehensive SA methods. Meanwhile, few surveys covering the topic of multimodal SA (MSA) have been explored yet. In this article, we first take the modality as the thread to design a novel framework of SA tasks to provide researchers with a comprehensive understanding of relevant advances in SA. Then, we introduce the general workflows and recent advances of single-modal in detail, discuss the similarities and differences of single-modal SA in data processing and modeling to guide MSA, and summarize the commonly used datasets to provide guidance on data and methods for researchers according to different task types. Next, a new taxonomy is proposed to fill the research gaps in MSA, which is divided into multimodal representation learning and multimodal data fusion. The similarities and differences between these two methods and the latest advances are described in detail, such as dynamic interaction between multimodalities, and the multimodal fusion technologies are further expanded. Moreover, we explore the advanced studies on multimodal alignment, chatbots, and Chat Generative Pre-trained Transformer (ChatGPT) in SA. Finally, we discuss the open research challenges of MSA and provide four potential aspects to improve future works, such as cross-modal contrastive learning and multimodal pretraining models.

show abstract

Inter- and Intra-Modal Contrastive Hybrid Learning Framework for Multimodal Abstractive Summarization

Cited by 6 publications

References 55 publications

Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector

Sentiment Analysis: Comprehensive Reviews, Recent Advances, and Open Challenges

Contact Info

Product

Resources

About