Multimodal Relation Extraction with Efficient Graph Alignment

Zheng, Changmeng; Feng, Junhao; Fu, Ze; Cai, Yi; Li, Qing; Wang, Tao

doi:10.1145/3474085.3476968

Cited by 29 publications

(24 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Detailed statistics are shown in Table 3. For multimodal RE, we evaluate on MNRE [59], a manually-labeled dataset for multimodal neural relation extraction, where the texts and image posts are crawled from Twitter. For multimodal NER, we conduct experiments on public Twitter dataset Twitter-2017 [25], which mainly include multimodal user posts published on Twitter during 2016-2017.…”

Section: Methodsmentioning

confidence: 99%

“…Besides, Sergieh et al [33] and Wang et al [47] jointly encode and fuse the visual and structural knowledge for multimodal link prediction through simple concatenation and autoencoder, respectively. On the other hand, Zheng et al [59] present an efficient modality alignment strategy based on scene graph for the MRE task. Zhang et al [55] fuse regional image features and textual features with extra co-attention layers for the MNER task.…”

Section: Superman Returnsmentioning

confidence: 99%

“…3) RSME [46], which designs a forget gate with an MRP metric to select valuable images for the multimodal KG embeddings learning. Multimodal RE: 1) BERT+SG is proposed in [59] for MRE, which concatenates the textual representation from BERT with visual features generated by scene graph (SG) tool [41]. 2) MEGA [59] designs the dual graph alignment of the correlation between entities and objects, which is the newest SOTA for MRE.…”

Section: Comparedmentioning

confidence: 99%

“…Multimodal RE: 1) BERT+SG is proposed in [59] for MRE, which concatenates the textual representation from BERT with visual features generated by scene graph (SG) tool [41]. 2) MEGA [59] designs the dual graph alignment of the correlation between entities and objects, which is the newest SOTA for MRE. Multimodal NER: 1) AdapCoAtt-BERT-CRF [58], which designs an adaptive co-attention network to induce word-aware visual representations for each word; 2) UMT [53], which extends Transformer to multi-modal version and incorporates the auxiliary entity span detection module; 3) UMGF [55], which proposes a unified multimodal graph fusion approach for MNER and achieves the newest SOTA for MNER.…”

Section: Comparedmentioning

confidence: 99%

“…A specific feature of our method is that we conduct modal fusion inside the dual-stream transformer rather than adding a fusion layer outside the transformer like IKRL [48], UMGF [55] and MEGA [59]. To this end, we add extra three layers M-Encoder based on ViT and BERT to evaluate the impact of the internal fusion mechanism.…”

Section: Effectiveness Of Internal Hybrid Fusion In Transformermentioning

confidence: 99%

See 4 more Smart Citations

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

Chen,

Zhang,

et al. 2022

Preprint

View full text Add to dashboard Cite

Multimodal Knowledge Graphs (MKGs), which organize visualtext factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and fine-grained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER 1 .

show abstract

Section: Methodsmentioning

confidence: 99%