“…The first group of methods includes BiLSTM-CRF (Huang et al, 2015), BERT-CRF (Devlin et al, 2018) as well as the span-based NER models (e.g., BERT-span, RoBERTa-span (Yamada et al, 2020)), which only consider original text. The second group of methods includes several latest multimodal approaches for MNER task: UMT (Yu et al, 2020), UMGF , MNER-QG (Jia et al, 2022), R-GCN , ITA (Wang et al, 2021a), PromptMNER (Wang et al, 2022b), CAT-MNER (Wang et al, 2022c) and MoRe (Wang et al, 2022a), which consider both text and corresponding images.…”