2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01274
|View full text |Cite
|
Sign up to set email alerts
|

Robust Audio-Visual Instance Discrimination

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
20
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 65 publications
(29 citation statements)
references
References 39 publications
1
20
0
Order By: Relevance
“…Faulty negatives are instances that are supposed to be similar with the anchor while considered as negative instances in CL. 54 Such faulty negative instances harm the robustness and performance of CL pre-trained model on downstream property prediction tasks. Additionally, the previous motif-level CL learns a motif dictionary and trains a sampler to sample subgraphs within each molecule, 50 which may ignore unique chemical substructure patterns.…”
Section: (-) (-)mentioning
confidence: 99%
See 1 more Smart Citation
“…Faulty negatives are instances that are supposed to be similar with the anchor while considered as negative instances in CL. 54 Such faulty negative instances harm the robustness and performance of CL pre-trained model on downstream property prediction tasks. Additionally, the previous motif-level CL learns a motif dictionary and trains a sampler to sample subgraphs within each molecule, 50 which may ignore unique chemical substructure patterns.…”
Section: (-) (-)mentioning
confidence: 99%
“…Faulty negatives are instances that are similar with the anchor yet are treated as negative instances in contrastive training. 54 In this case, M A is a "faulty negative" as it should not be far away from the anchor in the representation domain as other negative samples like M C . Faulty negatives strongly repel the anchor and the negative sample, even though they should preferably be close in representation domain.…”
Section: Mitigating Faulty Negativesmentioning
confidence: 99%
“…Recent years have witnessed a steady progress on self-supervised representation learning [10,13,31,32]. Contrastive learning [6,7,15,17,22,25,38] has been outstandingly successful on multiple tasks [26,30,40]. The core idea of contrastive learning is to align the positive sample pairs and repulse the negative sample pairs.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, these schemes differentiate the features extracted from unpaired video clips. Furthermore, some concurrent methods [28,30] jointly consider the correlations within each modality or across different modalities (i.e., audio and vision). Different from [28,30] that learn visual information of an entire image, our method leverages pseudo-annotations to provide training guidance from both sounding and non-sounding regions.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, some concurrent methods [28,30] jointly consider the correlations within each modality or across different modalities (i.e., audio and vision). Different from [28,30] that learn visual information of an entire image, our method leverages pseudo-annotations to provide training guidance from both sounding and non-sounding regions. Second, video temporal information [20,32] is explored to determine strong or weak correlation.…”
Section: Related Workmentioning
confidence: 99%