2019
DOI: 10.48550/arxiv.1904.01356
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…In multimodal interactivity, it calculates the matching between two features with a single attention, resulting in incorrect or incomplete attention. [31] Image captioning is a technique for creating a word from a visual area that is most closely related to the most recently generated term. With the purpose of obtaining richer semantic connections across multiple modalities, the proposed study employs an attention implementation similar to those employed in VQA.…”
Section: Introductionmentioning
confidence: 99%
“…In multimodal interactivity, it calculates the matching between two features with a single attention, resulting in incorrect or incomplete attention. [31] Image captioning is a technique for creating a word from a visual area that is most closely related to the most recently generated term. With the purpose of obtaining richer semantic connections across multiple modalities, the proposed study employs an attention implementation similar to those employed in VQA.…”
Section: Introductionmentioning
confidence: 99%
“…Typically, users combine text, image, audio or video to sell a product over an e-commence platform or express views on social media. The combination of these media types has been extensively studied to solve various tasks including classification [1], [2], [3], cross-modal retrieval [4] semantic relatedness [5], [6], image captioning [7], [8], multimodal named entity recognition [9], [10] and Visual Question Answering [11], [12]. In addition, multimodal data fueled an increased interest in generating images conditioned on natural language [13], [14].…”
Section: Introductionmentioning
confidence: 99%