2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00833
|View full text |Cite
|
Sign up to set email alerts
|

Positive Sample Propagation along the Audio-Visual Event Line

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 58 publications
(21 citation statements)
references
References 20 publications
0
21
0
Order By: Relevance
“…Hanyu X et al [6]proposed to learn inter and intra information between visual and audio modality by adaptive attention and self-attention modal. Jinxing Z et al [3] aggregates relevant information that probably not be available at the same time through the positive sample distribution model. And they all use the auditory guided visual attention module that we will discuss below.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Hanyu X et al [6]proposed to learn inter and intra information between visual and audio modality by adaptive attention and self-attention modal. Jinxing Z et al [3] aggregates relevant information that probably not be available at the same time through the positive sample distribution model. And they all use the auditory guided visual attention module that we will discuss below.…”
Section: Related Workmentioning
confidence: 99%
“…The sound source separation schemes proposed in [18], [19] show that the voices of different speakers can be distinguished by paying attention to the location of the spatial region around the speaker's voice and finding matching sound source information. In the audio-visual event localization task, [12] firstly adopted the auditory guided visual attention mechanism, [3], [6], [9] have followed this attention mechanism.…”
Section: Related Workmentioning
confidence: 99%
“…(Lin and Wang 2020) devise an Audiovisual Transformer to use audio as the guiding modality to refine visual features by performing spatial attention on contextual frames and instance attention to locate the sound-source within a frame. The Positive Sample Propagation module by (Zhou et al 2021) calculates similarity matrices between audio and visual features of different segments and thresholds them to eliminate insignificant audio-visual pairs. These matrices are used to co-refine similar segments together before fusing the modality information and learning temporal dependencies using LSTMs.…”
Section: Related Workmentioning
confidence: 99%
“…By integrating the audio and visual information in multimodal scenes, it is expected to explore more sufficient scene information and overcome the limited perception in single modality. Recently, there have been several works utilizing audio and visual modality to facilitate multimodal scene understanding in different perspectives, such as sound source localization [23,31,34,37,48] and separation [10,13,41,59,61,63], audio inpainting [62], event localization [4,43,64], action recognition [14], video parsing [42,47], captioning [24,40,50], and dialog [1,66].…”
Section: Audio-visual Learningmentioning
confidence: 99%