2021 IEEE International Conference on Multimedia and Expo (ICME) 2021
DOI: 10.1109/icme51207.2021.9428373
|View full text |Cite
|
Sign up to set email alerts
|

MPN: Multimodal Parallel Network for Audio-Visual Event Localization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…2) To achieve better performances, the Global-Local [16] samples video frames at 10 FPS on AVE [5] dataset for data augmentation. In our work, in order to keep the consistent experiment setup, we keep 1 FPS in our method as the same as existing literature [5]- [11], [74]. 3) The authors of Global-Local [16] suggest using 16 Tesla P100 GPU to handle the large-scale dataset of 240k videos while our model can be trained very lightly with just one GTX 1080 GPU without extra data.…”
Section: B2 More Discussion On the Comparison To Self-supervised Methodsmentioning
confidence: 99%
“…2) To achieve better performances, the Global-Local [16] samples video frames at 10 FPS on AVE [5] dataset for data augmentation. In our work, in order to keep the consistent experiment setup, we keep 1 FPS in our method as the same as existing literature [5]- [11], [74]. 3) The authors of Global-Local [16] suggest using 16 Tesla P100 GPU to handle the large-scale dataset of 240k videos while our model can be trained very lightly with just one GTX 1080 GPU without extra data.…”
Section: B2 More Discussion On the Comparison To Self-supervised Methodsmentioning
confidence: 99%
“…Yu et al. (2021) applied the self‐attention module and cross‐modal attention module to provide precise event localization results. H. Chen et al.…”
Section: Related Workmentioning
confidence: 99%
“…Wu et al (2019) proposed dual attention matching that used each feature for guidance for other features and outperformed the other state-of-the-art methods on the localization task. Yu et al (2021) applied the self-attention module and cross-modal attention module to provide precise event localization results. H. Chen et al (2021) thus evaluated the localization task of the objects with annotated bounding boxes by calculating the cosine similarity between the visual and auditory features.…”
Section: Multi-modal Learningmentioning
confidence: 99%
See 1 more Smart Citation