Robust Audio-Visual Instance Discrimination

Morgado, Pedro; Misra, Ishan; Vasconcelos, Nuno

doi:10.1109/cvpr46437.2021.01274

Cited by 65 publications

(29 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Faulty negatives are instances that are supposed to be similar with the anchor while considered as negative instances in CL. 54 Such faulty negative instances harm the robustness and performance of CL pre-trained model on downstream property prediction tasks. Additionally, the previous motif-level CL learns a motif dictionary and trains a sampler to sample subgraphs within each molecule, 50 which may ignore unique chemical substructure patterns.…”

Section: (-) (-)mentioning

confidence: 99%

“…Faulty negatives are instances that are similar with the anchor yet are treated as negative instances in contrastive training. 54 In this case, M A is a "faulty negative" as it should not be far away from the anchor in the representation domain as other negative samples like M C . Faulty negatives strongly repel the anchor and the negative sample, even though they should preferably be close in representation domain.…”

Section: Mitigating Faulty Negativesmentioning

confidence: 99%

See 1 more Smart Citation

Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

Wang,

Magar,

Liang

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning has been a prevalence in computational chemistry and widely implemented in molecule property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), gathers growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multi-level graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs) in two aspects, (1) mitigating faulty negative contrastive instances via considering cheminformatics similarities between molecule pairs; (2) fragment-level contrasting between intra-and inter-molecule substructures decomposed from molecules. Experiments have shown

show abstract

Section: (-) (-)mentioning

confidence: 99%

Section: Mitigating Faulty Negativesmentioning

confidence: 99%

Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

Wang,

Magar,

Liang

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent years have witnessed a steady progress on self-supervised representation learning [10,13,31,32]. Contrastive learning [6,7,15,17,22,25,38] has been outstandingly successful on multiple tasks [26,30,40]. The core idea of contrastive learning is to align the positive sample pairs and repulse the negative sample pairs.…”

Section: Related Workmentioning

confidence: 99%

Targeted Supervised Contrastive Learning for Long-Tailed Recognition

Li¹,

Cao²,

Yuan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Real-world data often exhibits long tail distributions with heavy class imbalance, where the majority classes can dominate the training process and alter the decision boundaries of the minority classes. Recently, researchers have investigated the potential of supervised contrastive learning for long-tailed recognition, and demonstrated that it provides a strong performance gain. In this paper, we show that while supervised contrastive learning can help improve performance, past baselines suffer from poor uniformity brought in by imbalanced data distribution. This poor uniformity manifests in samples from the minority class having poor separability in the feature space. To address this problem, we propose targeted supervised contrastive learning (TSC), which improves the uniformity of the feature distribution on the hypersphere. TSC first generates a set of targets uniformly distributed on a hypersphere. It then makes the features of different classes converge to these distinct and uniformly distributed targets during training. This forces all classes, including minority classes, to maintain a uniform distribution in the feature space, improves class boundaries, and provides better generalization even in the presence of long-tail data. Experiments on multiple datasets show that TSC achieves state-of-the-art performance on long-tailed recognition tasks.

show abstract

“…In addition, these schemes differentiate the features extracted from unpaired video clips. Furthermore, some concurrent methods [28,30] jointly consider the correlations within each modality or across different modalities (i.e., audio and vision). Different from [28,30] that learn visual information of an entire image, our method leverages pseudo-annotations to provide training guidance from both sounding and non-sounding regions.…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, some concurrent methods [28,30] jointly consider the correlations within each modality or across different modalities (i.e., audio and vision). Different from [28,30] that learn visual information of an entire image, our method leverages pseudo-annotations to provide training guidance from both sounding and non-sounding regions. Second, video temporal information [20,32] is explored to determine strong or weak correlation.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Sound Localization via Iterative Contrastive Learning

Lin¹,

Tseng²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Sound localization aims to find the source of the audio signal in the visual scene. However, it is labor-intensive to annotate the correlations between the signals sampled from the audio and visual modalities, thus making it difficult to supervise the learning of a machine for this task. In this work, we propose an iterative contrastive learning framework that requires no data annotations. At each iteration, the proposed method takes the 1) localization results in images predicted in the previous iteration, and 2) semantic relationships inferred from the audio signals as the pseudolabels. We then use the pseudo-labels to learn the correlation between the visual and audio signals sampled from the same video (intra-frame sampling) as well as the association between those extracted across videos (inter-frame relation). Our iterative strategy gradually encourages the localization of the sounding objects and reduces the correlation between the non-sounding regions and the reference audio. Quantitative and qualitative experimental results demonstrate that the proposed framework performs favorably against existing unsupervised and weakly-supervised methods on the sound localization task.

show abstract

Robust Audio-Visual Instance Discrimination

Cited by 65 publications

References 39 publications

Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

Targeted Supervised Contrastive Learning for Long-Tailed Recognition

Unsupervised Sound Localization via Iterative Contrastive Learning

Contact Info

Product

Resources

About