Hollywood 3D: What are the Best 3D Features for Action Recognition?

Hadfield, Simon; Lebeda, Karel; Bowden, Richard

doi:10.1007/s11263-016-0917-2

Cited by 15 publications

(9 citation statements)

References 51 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This offers the possibility of tapping into millions of high-quality images from an ever-growing library of content. We note that 3D movies have been used in related tasks in isolation [49], [50]. We will show that their full potential is unlocked by combining them with other, complementary data sources.…”

Section: D Moviesmentioning

confidence: 91%

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer

Ranftl

Lasinger²,

Hafner

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

698

603

View full text Add to dashboard Cite

The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation.

show abstract

Section: D Moviesmentioning

confidence: 91%

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer

Ranftl

Lasinger²,

Hafner

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

698

603

View full text Add to dashboard Cite

show abstract

“…2D TSD 78% 3D TSD 74% 2D TSD+3D TSD 85% [20] 20.8% [24] 21.8% signed to cope specifically with random camera motions and/or rotations, as they can degrade the trajectory extraction drastically. As it can be seen, our method still yields superior results compared to the trajectory aligned descriptors proposed in [20] and reported in [24]. Our method also outperforms the method proposed by [24] in terms of accuracy.…”

Section: Methods Accuracymentioning

confidence: 99%

“…Hadfield et al [24] used 3D Hollywood movies to create a challenging stereo dataset for human activity recognition. The authors estimated the calibration information using RANSAC method and repeating the process 100 times, before selecting the best estimation.…”

Section: Background Workmentioning

confidence: 99%

Disparity-Augmented Trajectories for Human Activity Recognition

Habashi¹,

Boufama²,

Ahmad³

2019

Preprint

View full text Add to dashboard Cite

Numerous methods for human activity recognition have been proposed in the past two decades. Many of these methods are based on sparse representation, which describes the whole video content by a set of local features. Trajectories, being mid-level sparse features, are capable of describing the motion of an interest-point in 2D space. 2D trajectories might be affected by viewpoint changes, potentially decreasing their accuracy. In this paper, we initially propose and compare different 2D trajectory-based algorithms for human activity recognition. Moreover, we propose a new way of fusing disparity information with 2D trajectory information, without the calculation of 3D reconstruction. The obtained results show a 2.76% improvement when using disparityaugmented trajectories, compared to using the classical 2D trajectory information only. Furthermore, we have also tested our method on the challenging Hollywood 3D dataset, and we have obtained competitive results, at a faster speed.

show abstract

“…The DiDeMo dataset [6] has been introduced for temporal localization given natural language, but has also been used for the purpose of textto-clip video retrieval [317]. Recently, the Hollywood 3D dataset was proposed [93] which contains 650 stereo clips with 14 action classes, together with stereo calibration and depth reconstruction.…”

Section: Other Datasetsmentioning

confidence: 99%

A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision

Georgiou

Liu

Chen

et al. 2019

Int J Multimed Info Retr

108

View full text Add to dashboard Cite

Higher dimensional data such as video and 3D are the leading edge of multimedia retrieval and computer vision research. In this survey, we give a comprehensive overview and key insights into the state of the art of higher dimensional features from deep learning and also traditional approaches. Current approaches are frequently using 3D information from the sensor or are using 3D in modeling and understanding the 3D world. With the growth of prevalent application areas such as 3D games, self-driving automobiles, health monitoring and sports activity training, a wide variety of new sensors have allowed researchers to develop feature description models beyond 2D. Although higher dimensional data enhance the performance of methods on numerous tasks, they can also introduce new challenges and problems. The higher dimensionality of the data often leads to more complicated structures which present additional problems in both extracting meaningful content and in adapting it for current machine learning algorithms. Due to the major importance of the evaluation process, we also present an overview of the current datasets and benchmarks. Moreover, based on more than 330 papers from this study, we present the major challenges and future directions.

show abstract

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Cited by 15 publications

References 51 publications

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer

Disparity-Augmented Trajectories for Human Activity Recognition

A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision

Contact Info

Product

Resources

About