We tackle the problem of learning object detectors without supervision. Differently from weakly-supervised object detection, we do not assume image-level class labels. Instead, we extract a supervisory signal from audio-visual data, using the audio component to "teach" the object detector. While this problem is related to sound source localisation, it is considerably harder because the detector must classify the objects by type, enumerate each instance of the object, and do so even when the object is silent. We tackle this problem by first designing a self-supervised framework with a contrastive objective that jointly learns to classify and localise objects. Then, without using any supervision, we simply use these self-supervised labels and boxes to train an image-based object detector. With this, we outperform previous unsupervised and weakly-supervised detectors for the task of object detection and sound source localization. We also show that we can align this detector to ground-truth classes with as little as one label per pseudo-class, and show how our method can learn to detect generic objects that go beyond instruments, such as airplanes and cats.
We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms, new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD 3 HD 2 HD 1 structured ma-1 equal contribution 2 partly supported by NSF grant CCF-1421161 trix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications.
Left-handers comprise approximately 15% of professional tennis players, but only 11% of the general population. In boxing, baseball, fencing, table-tennis and specialist batting positions in cricket the contrast is even starker, with 30% or more of top players often being left-handed. In this paper we propose a model for identifying the advantage of being left-handed in one-on-one interactive sports (as well as the inherent skill of each player). We construct a Bayesian latent ability model in the spirit of the classic Glicko model but with the additional complication of having a latent factor, i.e. the advantage of left-handedness, that we need to estimate. Inference is further complicated by the truncated nature of data-sets that arise from only having data of the top players. We show how to infer the advantage of left-handedness when only the proportion of top left-handed players is available. We use this result to develop a simple dynamic model for inferring how the advantage of left-handedness varies through time. We also extend the model to cases where we have ranking or match-play data. We test these models on 2014 match-play data from top male professional tennis players, and the dynamic model on data from 1985 to 2016.
Terms such as exploitation, exploration, intensification and diversification are routinely employed in the metaheuristic literature to explain empirical runtime performance. Six prevalent views on exploitation and exploration are identified in the literature, each expressing a different aspect of these notions. The consistency and meaningfulness of these views are substantiated by their deducibility from the proposed novel definitions of exploitation and exploration, based on the hypothetical construct of a probable fitness landscape. This unifies, and thereby clarifies, the terminology and understanding of metaheuristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.