In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer vision applications requiring to react as early as possible, such as autonomous navigation. In this paper, we propose a new action anticipation method that achieves high prediction accuracy even in the presence of a very small percentage of a video sequence. To this end, we develop a multi-stage LSTM architecture that leverages context-aware and action-aware features, and introduce a novel loss function that encourages the model to predict the correct class as early as possible. Our experiments on standard benchmark datasets evidence the benefits of our approach; We outperform the state-of-the-art action anticipation methods for early prediction by a relative increase in accuracy of 22.0% on JHMDB-21, 14.0% on UT-Interaction and 49.9% on UCF-101.
This paper presents a comparative analysis of different pedestrian dataset characteristics. The main goal of the research is to determine what characteristics are desirable for improved training and validation of pedestrian detectors and classifiers. The work focuses on those aspects of the dataset which affect classification success using the most common boosting methods.Dataset characteristics such as image size, aspect ratio, geometric variance and the relative scale of positive class instances (pedestrians) within the training window form an integral part of classification success. This paper will examine the effects of varying these dataset characteristics with a view to determining the recommended attributes of a high quality and challenging dataset. While the primary focus is on characteristics of the positive training dataset, some discussion of desirable attributes for the negative dataset is important and is therefore included. This paper also serves to publish our current pedestrian dataset in various forms for non-commercial use by the scientific community. We believe the published dataset to be one of the largest, most flexible, and representative datasets available for pedestrian/person detection tasks.
Abstract-This paper presents a Weak Classifier that is extremely fast to compute, yet highly discriminant. This Weak Classifier may be used in, for example, a boosting framework and is the result of a novel way of organizing and evaluating Histograms of Oriented Gradients. The method requires only one access to main memory to evaluate each feature, in comparison with the more well-known Haar features which require somewhere between six and nine memory accesses to evaluate each feature. This low memory bandwidth makes the Weak Classifier especially ideal for use in small systems with little or no memory cache available.The presented Weak Classifier has been extensively tested in a boosted framework on data sets consisting of pedestrians and various road signs. The classifier yields detection results that are far superior than the results obtained from Haar features when tested on road signs and similar structures, whereas the detection results are comparable to those of Haar features when tested on pedestrians. In addition, the computational resources necessary for these results have been shown to be considerably smaller for the new weak classifier.
Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, taskspecific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA 2 , covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.
Abstract-This paper presents a fast Histogram of Oriented Gradients (HOG) based weak classifier that is extremely fast to compute and highly discriminative. This feature set has been developed in an effort to balance the required processing and memory bandwidth so as to eliminate bottlenecks during run time evaluation. The feature set is the next generation in a series of features based on a novel precomputed image for HOG based features. It contains features which are more balanced in terms of processing and memory requirements than its predecessors, has a larger and richer feature space, and is more discriminant on a per feature basis.In terms of computational complexity it is a heterogeneous feature set. I.e. it has fast and slow variants. In order to optimize our feature selections between the faster and slower features available we implement a recently proposed modification to the RealBoost feature selection rule. This modification provides an additional means to balance processing and memory bandwidth on ordinary PC architectures. This feature set is suitable for use within typical boosting frameworks. It is compared to Haar and Rectangular HOG features, as well the related feature HistFeat. The new feature set contains two variants, LiteHOG and LiteHOG+, which we compare. Both LiteHOG and LiteHOG+ features show promising results on road sign and pedestrian detection tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.