We present a novel method for multiple people tracking that leverages a generalized model for capturing interactions among individuals. At the core of our model lies a learned dictionary of interaction feature strings which capture relationships between the motions of targets. These feature strings, created from low-level image features, lead to a much richer representation of the physical interactions between targets compared to hand-specified social force models that previous works have introduced for tracking. One disadvantage of using social forces is that all pedestrians must be detected in order for the forces to be applied, while our method is able to encode the effect of undetected targets, making the tracker more robust to partial occlusions. The interaction feature strings are used in a Random Forest framework to track targets according to the features surrounding them. Results on six publicly available sequences show that our method outperforms state-of-theart approaches in multiple people tracking.
In this paper, we treat the problem of continuous pose estimation for object categories as a regression problem on the basis of only 2D training information. While regression is a natural framework for continuous problems, regression methods so far achieved inferior results with respect to 3D-based and 2D-based classification-and-refinement approaches. This may be attributed to their weakness to high intra-class variability as well as to noisy matching procedures and lack of geometrical constraints.We propose to apply regression to Fisher-encoded vectors computed from large cells by learning an array of Fisher regressors. Fisher encoding makes our algorithm flexible to variations in class appearance, while the array structure permits to indirectly introduce spatial context information in the approach. We formulate our problem as a MAP inference problem, where the likelihood function is composed of a generative term based on the prediction error generated by the ensemble of Fisher regressors as well as a discriminative term based on SVM classifiers.We test our algorithm on three publicly available datasets that envisage several difficulties, such as high intra-class variability, truncations, occlusions, and motion blur, obtaining state-of-the-art results.
We present a feature-based framework that combines spatial feature clustering, guided sampling for pose generation, and model updating for 3D object recognition and pose estimation. Existing methods fails in case of repeated patterns or multiple instances of the same object, as they rely only on feature discriminability for matching and on the estimator capabilities for outlier rejection. We propose to spatially separate the features before matching to create smaller clusters containing the object. Then, hypothesis generation is guided by exploiting cues collected off-and on-line, such as feature repeatability, 3D geometric constraints, and feature occurrence frequency. Finally, while previous methods overload the model with synthetic features for wide baseline matching, we claim that continuously updating the model representation is a lighter yet reliable strategy. The evaluation of our algorithm on challenging video sequences shows the improvement provided by our contribution.
In this paper we propose a method to consistently recover the pose of an object from a known class in a video sequence. As individual poses estimated from monocular images are rather noisy, we optimally aggregate pose evidence over all video frames. We construct a graph where nodes are values sampled from the pose posterior distributions computed by a continuous pose estimator in each frame of the sequence. We then find the globally optimum pose path through the graph that best explains the pose evidence for the whole sequence. As a result, we recover the correct object orientation at each frame even if single-frame pose evidence is sometimes inaccurate. We evaluate our approach on two publicly available car datasets, which encompass busy street scenarios and car races with significant changes in car orientation, blur and occlusions. We show that our method outperforms state-of-the-art approaches reducing the error by 40% on the challenging KITTI dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.