A novel method is proposed for generic target tracking by audio measurements from a microphone array. To cope with noisy environments characterized by persistent and high energy interfering sources, a classification map (CM) based on spectral signatures is calculated by means of a machine learning algorithm. Next, the CM is combined with the acoustic map, describing the spatial distribution of sound energy, in order to obtain a cleaned joint map in which contributions from the disturbing sources are removed. A likelihood function is derived from this map and fed to a particle filter yielding the target location estimation on the acoustic image. The method is tested on two real environments, addressing both speaker and vehicle tracking. The comparison with a couple of trackers, relying on the acoustic map only, shows a sharp improvement in performance, paving the way to the application of audio tracking in real challenging environments.
Capturing the essential characteristics of visual objects by considering how their features are inter-related is a recent philosophy of object classification. In this paper, we embed this principle in a novel image descriptor, dubbed Heterogeneous Auto-Similarities of Characteristics (HASC). HASC is applied to heterogeneous dense features maps, encoding linear relations by covariances and nonlinear associations through information-theoretic measures such as mutual information and entropy. In this way, highly complex structural information can be expressed in a compact, scale invariant and robust manner. The effectiveness of HASC is tested on many diverse detection and classification scenarios, considering objects, textures and pedestrians, on widely known benchmarks Brodatz,
Encoding an object essence in terms of self-similarities between its parts is becoming a popular strategy in Computer Vision. In this paper, a new similarity-based descriptor, dubbed Structural Similarity Cross-Covariance Tensor is proposed, aimed to encode relations among different regions of an image in terms of cross-covariance matrices. The latter are calculated between low-level feature vectors extracted from pairs of regions. The new descriptor retains the advantages of the widely used covariance matrix descriptors [1], extending their expressiveness from local similarities inside a region to structural similarities across multiple regions. The new descriptor, applied on top of HOG, is tested on object and scene classification tasks with three datasets. The proposed method always outclasses baseline HOG and yields significant improvement over a recently proposed self-similarity descriptor in the two most challenging datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.