Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/67
|View full text |Cite
|
Sign up to set email alerts
|

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

Abstract: The problem of grounding language in vision is increasingly attracting scholarly efforts. As of now, however, most of the approaches have been limited to word embeddings, which are not capable of handling polysemous words. This is mainly due to the limited coverage of the available semantically-annotated datasets, hence forcing research to rely on alternative technologies (i.e., image search engines). To address this issue, we introduce EViLBERT, an approach which is able to perform image classificatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 22 publications
(1 reference statement)
0
8
0
Order By: Relevance
“…The image encoder is used to capture the semantic information contained in the images in a BabelNet synset. Previous studies have shown that images can help learn better semantic representations for concepts and entities (Xie et al, 2017a;Calabrese et al, 2020). We believe that images are also beneficial to SPBS.…”
Section: Image Encodermentioning
confidence: 82%
See 1 more Smart Citation
“…The image encoder is used to capture the semantic information contained in the images in a BabelNet synset. Previous studies have shown that images can help learn better semantic representations for concepts and entities (Xie et al, 2017a;Calabrese et al, 2020). We believe that images are also beneficial to SPBS.…”
Section: Image Encodermentioning
confidence: 82%
“…It has been utilized in multiple NLP tasks (Navigli et al, 2021), especially the cross-lingual or multilingual tasks, such as multilingual word sense disambiguation (Navigli and Ponzetto, 2012b), cross-lingual lexical entailment (Vyas and Carpuat, 2016) and cross-lingual AMR parsing (Blloshmi et al, 2020). Most of these studies regard BabelNet as a large multilingual sense inventory and utilize the multilingual synonyms and glosses in BabelNet synsets, and some studies also use images in it, e.g., Calabrese et al (2020) learn multimodal sense embeddings with the concepts and images in BabelNet.…”
Section: Babelnetmentioning
confidence: 99%
“…We note that different kinds of knowledge are orthogonal to each other and can be exploited in conjunction. For example, token classification models benefit from the logits-adjacency matrix multiplication , binary cross-entropy training , translation-based refinement [Luan et al, 2020] and visual information [Calabrese et al, 2020a].…”
Section: Discussionmentioning
confidence: 99%
“…In a different direction, Calabrese et al [2020a] leverage images from the BabelPic dataset [Calabrese et al, 2020b] to build multimodal gloss vectors, which are shown to be stronger than text-only vectors when used to initialize the weights of the classification matrix ( in Eq. 1).…”
Section: Supervised Wsd Exploiting Other Knowledgementioning
confidence: 99%
“…using LSTMs (Melamud et al, 2016) or the Transformer architecture (Devlin et al, 2019;Conneau et al, 2020), and are capable of representing words based on the context in which they occur. Contextualized representations have also been used to obtain effective sense embeddings (Loureiro and Jorge, 2019;Scarlini et al, 2020a,b;Calabrese et al, 2020).…”
Section: Introductionmentioning
confidence: 99%