When watching videos, the occurrence of a visual event is often accompanied by an audio event, e.g., the voice of lip motion, the music of playing instruments. There is an underlying correlation between audio and visual events, which can be utilized as free supervised information to train a neural network by solving the pretext task of audiovisual synchronization. In this paper, we propose a novel self-supervised framework with co-attention mechanism to learn generic cross-modal representations from unlabelled videos in the wild, and further benefit downstream tasks. Specifically, we explore three different co-attention modules to focus on discriminative visual regions correlated to the sounds and introduce the interactions between them. Experiments show that our model achieves state-of-the-art performance on the pretext task while having fewer parameters compared with existing methods. To further evaluate the generalizability and transferability of our approach, we apply the pre-trained model on two downstream tasks, i.e., sound source localization and action recognition. Extensive experiments demonstrate that our model provides competitive results with other self-supervised methods, and also indicate that our approach can tackle the challenging scenes which contain multiple sound sources. CCS CONCEPTS • Information systems → Multimedia information systems; • Computing methodologies → Computer vision.
BackgroundBoth stent retriever (SR) and contact aspiration (CA) are widely used as first-line strategies for acute posterior circulation strokes (PCS). However, it is still unclear how CA and SR compare as the first-line treatment of acute PCS. Several new studies have been published recently, so we aimed to perform an updated meta-analysis.MethodsThe meta-analysis was conducted according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) statement. Random-effects models were performed to pool the outcomes and the value of I2 was calculated to assess the heterogeneity.ResultsTen observational studies with 1189 patients were included, among whom 492 received first-line CA and 697 received first-line SR. The pooled results revealed that first-line CA could achieve a significantly higher proportion of modified Thrombolysis In Cerebral Infarction (mTICI) 2b/3 (OR 1.90, 95% CI 1.33 to 2.71, I2=0%), mTICI 3 (OR 1.95, 95% CI 1.15 to 3.31, I2=59.6%), first-pass effect (OR 2.91, 95% CI 1.51 to 5.58, I2=0%), lower incidence of new-territory embolic events (OR 0.20, 95% CI 0.05 to 0.83, I2=0%), and shorter procedure time (mean difference −29.4 min, 95% CI −46.8 to −12.0 min, I2=62.8%) compared with first-line SR. At 90-day follow-up, patients subjected to first-line CA showed a higher functional independence (modified Rankin Scale score 0–2; OR 1.38, 95% CI 1.01 to 1.87, I2=23.5%) and a lower mortality (OR 0.71, 95% CI 0.50 to 1.00, p=0.050, I2=0%) than those subjected to first-line SR.ConclusionsThis meta-analysis suggests that the first-line CA strategy could achieve better recanalization and clinical outcomes for acute PCS than first-line SR. Limited by the quality of included studies, this conclusion should be drawn with caution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.