Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3548361
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Retrieve Videos by Asking Questions

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…With the progress in AI, the integration of information from different modalities, such as text, image, audio, and video, has been known to provide complete information for building effective end-to-end dialogue systems [58,27,39,37] by bringing the different areas of computer vision (CV) and natural language processing (NLP) together. Hence, a multimodal dialogue system bridges the gap between vision and language, ensuring interdisciplinary research.…”
Section: Introductionmentioning
confidence: 99%
“…With the progress in AI, the integration of information from different modalities, such as text, image, audio, and video, has been known to provide complete information for building effective end-to-end dialogue systems [58,27,39,37] by bringing the different areas of computer vision (CV) and natural language processing (NLP) together. Hence, a multimodal dialogue system bridges the gap between vision and language, ensuring interdisciplinary research.…”
Section: Introductionmentioning
confidence: 99%