2018
DOI: 10.1109/tcsvt.2017.2727963
|View full text |Cite
|
Sign up to set email alerts
|

Example-Based 3D Trajectory Extraction of Objects From 2D Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…1, we make the following observations: (1) Our method outperforms all the baselines and competitors, establishing new state-of-the-art with 74.1%, 84.3% and 86.8% on 1-shot, 3-shot and 5-shot settings, respectively. (2) We notice that the strong baseline "RGB Basenet++" is superior to "RGB Basenet", which is mainly contributed by our basic sampling strategy (section 3.3). It quite makes sense, since the clips of length π‘›π‘’π‘š 𝑠𝑒𝑔 * π‘›π‘’π‘š 𝑓 sampled by our method capture more temporal information, especially in the case of long-term videos.…”
Section: Results On Kineticsmentioning
confidence: 93%
See 1 more Smart Citation
“…1, we make the following observations: (1) Our method outperforms all the baselines and competitors, establishing new state-of-the-art with 74.1%, 84.3% and 86.8% on 1-shot, 3-shot and 5-shot settings, respectively. (2) We notice that the strong baseline "RGB Basenet++" is superior to "RGB Basenet", which is mainly contributed by our basic sampling strategy (section 3.3). It quite makes sense, since the clips of length π‘›π‘’π‘š 𝑠𝑒𝑔 * π‘›π‘’π‘š 𝑓 sampled by our method capture more temporal information, especially in the case of long-term videos.…”
Section: Results On Kineticsmentioning
confidence: 93%
“…To the best of our knowledge, we are the first to introduce depth as the carrier of scene information for video recognition under few-shot learning. Note that [2] estimates the depth to correct the camera odometry. Most of the multi-modality models simply fuse the features from different streams by averaging [45], concatenation [56], recurrent neural networks [69] or with fully connected layers [21], while we adaptively learn how to combine the RGB stream and depth stream by introducing a novel depth guided adaptive instance normalization module.…”
Section: Related Workmentioning
confidence: 99%
“…The existing approach offers importance to extracting the semantic factor associated with capturing essential features for performing the track. Boukhers et al [22] have discussed a probability-based model for obtaining trajectories of a threedimensional object from two-dimensional video feeds. The model can estimate the object depth from the calculated focal length.…”
Section: Introductionmentioning
confidence: 99%
“…The key factor of system quality is to know the exact present position of a vehicle, and to predict its future position accurately by monitoring the movement of a vehicle for collision avoidance [4] [5].…”
Section: Introductionmentioning
confidence: 99%