Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3548200
|View full text |Cite
|
Sign up to set email alerts
|

Where Are You Looking?

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(3 citation statements)
references
References 33 publications
0
3
0
Order By: Relevance
“…Text-to-video Retrieval. Video analysis (Wang et al 2023(Wang et al , 2022Zeng et al 2022;Liu et al 2023b,a;Jin et al 2022) has recently gained much attention due to the increasing video data on the Internet. Among them, the text-to-video retrieval (T2VR) task (Dong, Li, and Snoek 2018;Chen et al 2020;Li et al 2019;Faghri et al 2017;Gao et al 2023;Lei, Berg, and Bansal 2021;Li et al 2023) aims to retrieve relevant videos from a set of pre-trimmed video clips given a text description.…”
Section: Related Workmentioning
confidence: 99%
“…Text-to-video Retrieval. Video analysis (Wang et al 2023(Wang et al , 2022Zeng et al 2022;Liu et al 2023b,a;Jin et al 2022) has recently gained much attention due to the increasing video data on the Internet. Among them, the text-to-video retrieval (T2VR) task (Dong, Li, and Snoek 2018;Chen et al 2020;Li et al 2019;Faghri et al 2017;Gao et al 2023;Lei, Berg, and Bansal 2021;Li et al 2023) aims to retrieve relevant videos from a set of pre-trimmed video clips given a text description.…”
Section: Related Workmentioning
confidence: 99%
“…The VR headset has a built-in accelerometer and we are able to easily calculate the current headset position (X,Y,Z) and the rotation of the headset (yaw, pitch, and roll). Besides, gaze information is also important as it provides more fine-grained features [12]. For the gaze data collection, we rely on the built-in eye tracker in the headset with a sample rate of 144 Hz.…”
Section: Data Collection Proceduresmentioning
confidence: 99%
“…The key difference between volumetric video compared with traditional 2D flat video [4,5] lies in the 3D representation, where the commonly used formats are point cloud, mesh, voxel, and the recent implicit neural representation. Among all these representations, point cloud is currently the most popular due to its simplicity and easy deployment [6].…”
Section: Introductionmentioning
confidence: 99%