Abstract. A major issue with Time of Flight sensors is the presence of multipath interference. We present Sparse Reflections Analysis (SRA), an algorithm for removing this interference which has two main advantages. First, it allows for very general forms of multipath, including interference with three or more paths, diffuse multipath resulting from Lambertian surfaces, and combinations thereof. SRA removes this general multipath with robust techniques based on L1 optimization. Second, due to a novel dimension reduction, we are able to produce a very fast version of SRA, which is able to run at frame rate. Experimental results on both synthetic data with ground truth, as well as real images of challenging scenes, validate the approach.
This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we describe an online audio-visual speaker diarization method that leverages face tracking and identification, sound source localization, speaker identification, and, if available, prior speaker information for robustness to various real world challenges. All components are integrated in a meeting transcription framework called SRD, which stands for "separate, recognize, and diarize". Experimental results using recordings of natural meetings involving up to 11 attendees are reported. The continuous speech separation improves a word error rate (WER) by 16.1% compared with a highly tuned beamformer. When a complete list of meeting attendees is available, the discrepancy between WER and speaker-attributed WER is only 1.0%, indicating accurate wordto-speaker association. This increases marginally to 1.6% when 50% of the attendees are unknown to the system.
For the past few years researches have been investigating enhancing tracking performance by combining several different tracking algorithms. We propose an analytically justified, probabilistic framework to combine multiple tracking algorithms. The separate tracking algorithms considered output a probability distribution function of the tracked state, sequentially for each image. The algorithms may output either an explicit probability distribution function, or a sample-set of it via CONDENSATION. The proposed framework is general and allows the combination of any set of separate tracking algorithms of this kind, even on different state spaces of different dimensionality, under a few reasonable assumptions. The combination may consist of different tracking algorithms that track a common object, as well as algorithms that track separate, albeit related objects, thus improving the tracking performance of each object. In many of the investigated settings, our approach allows us to treat the separate tracking algorithms as "closed boxes". In other words, only the state distributions in the input and output are needed for the combination process. The suggested framework was successfully tested using various state spaces and datasets.
This paper addresses the "boundary ownership" problem, also known as the figure/ground
Abstract.Over the past few years researchers have been investigating the enhancement of visual tracking performance by devising trackers that simultaneously make use of several different features. In this paper we investigate the combination of synchronous visual trackers that use different features while treating the trackers as "black boxes". That is, instead of fusing the usage of the different types of data as has been performed in previous work, the combination here is allowed to use only the trackers' output estimates, which may be modified before their propagation to the next time step. We propose a probabilistic framework for combining multiple synchronous trackers, where each separate tracker outputs a probability density function of the tracked state, sequentially for each image. The trackers may output either an explicit probability density function, or a sample-set of it via CONDENSATION. Unlike previous tracker combinations, the proposed framework is fairly general and allows the combination of any set of trackers of this kind, even in different state-spaces of different dimensionality, under a few reasonable assumptions. The combination may consist of different trackers that track a common object, as well as trackers that track separate, albeit related objects, thus improving the tracking performance of each object. The benefits of merely using the final estimates of the separate trackers in the combination are twofold. Firstly, the framework for the combination is fairly general and may be easily used from the software aspects. Secondly, the combination may be performed in a distributed setting, where each separate tracker runs on a different site and uses different data, while avoiding the need to share the data. The suggested framework was successfully tested using various state-spaces and datasets, demonstrating that fusing the trackers' final distribution estimates may indeed be applicable.
Cross-bin metrics have been shown to be more suitable than bin-by-bin metrics for measuring the distance between histograms in various applications. In particular, a visual tracker that minimizes the earth mover's distance (EMD) between the candidate and reference feature histograms has recently been proposed. This tracker was shown to be more robust than the Mean Shift tracker, which employs a bin-by-bin metric. In each frame, the former tracker iteratively shifts the candidate location by one pixel in the direction opposite to the EMD's gradient until no improvement is made. This optimization process involves the clustering of the candidate feature density in feature space, as well as the computation of the EMD between the candidate and reference feature histograms after each shift of the candidate location. In this paper, alternative trackers that employ cross-bin metrics as well, but that are based on Mean Shift (MS) iterations, are derived. The proposed trackers are simpler and faster due to 1) the use of MS-based optimization, which is not restricted to single pixel shifts, 2) abstention from any clustering of feature densities, and 3) abstention from EMD computations in multidimensional spaces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.