Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1487
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts

Abstract: This paper demonstrates three different systems capable of performing the multimodal word discovery task. A multimodal word discovery system accepts, as input, a database of spoken descriptions of images (or a set of corresponding phone transcripts), and learns a lexicon which is a mapping from phone strings to their associated image concepts. Three systems are demonstrated: one based on a statistical machine translation (SMT) model, two based on neural machine translation (NMT). On Flickr8k, the SMT-based mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…In other words, the national media image acts as a bridge between the national physical image and the cognitive image. [5] From an information dissemination perspective, the national media image essentially involves the construction and dissemination of information related to the national image. In this context, "information" specifically refers to the concrete content conveyed by the abstract symbol of the national image.…”
Section: National Media Imagementioning
confidence: 99%
“…In other words, the national media image acts as a bridge between the national physical image and the cognitive image. [5] From an information dissemination perspective, the national media image essentially involves the construction and dissemination of information related to the national image. In this context, "information" specifically refers to the concrete content conveyed by the abstract symbol of the national image.…”
Section: National Media Imagementioning
confidence: 99%