Identifying First-Person Camera Wearers in Third-Person Videos

Cheng, Fan; Lee, Jangwon; Xu, Mingze; Singh, Krishna Kumar; Lee, Yong Jae; Crandall, David J.; Ryoo, Michael S.

doi:10.1109/cvpr.2017.503

Cited by 43 publications

(48 citation statements)

References 27 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baselines We first implement multiple baselines to compare the performance considering inputs, and models. These baseline method are proposed in peer researches [14,27,23] including spatial-domain siamese network [14], motion-domain siamese network [14], twostream semi-siamese network [14], triplet network [27], and temporal domain image and flow network [14,23]. We also demonstrate the weight share performance for siamesenetwork.…”

Section: Results and Comparisonmentioning

confidence: 91%

“…The authors proposed a 'Graph' representation for temporal and spatial matching. In [14], the authors solved the task to localize the person in the third view if given the both the third and ego camera frames. In this paper, spatial-domain semisiamese, motion-domain semi-siamese, dual-domain semisiamese, and dual-domain semi-triplet networks are well studied.…”

Section: Related Workmentioning

confidence: 99%

“…Unlike the top-and-forward view [5] and third-forward view [14] cases, the ego-downward mounting faces the following challenges : 1) appearance verification across different views does not hold under this situation since it is not pointing out to scenario; 2) clothes texture verification will not work since in large crowd there should have the simi-lar dressing or occlusion; 3) the same action with different initial pose state (in world coordinate system) will also mislead the model since the ego-downward frames will not tell the difference (in Fig.3). Thus, using a general siamese or triplet model to correlate the two views with temporal and spatial information would fail [14]. Moreover, the graph solution using relative view insight will not happen under this situation [5].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Visual-GPS: Ego-Downward and Ambient Video Based Person Location Association

Yang

Jiang

Huo

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Using an ego-centric camera to do localization and tracking is highly needed for urban navigation and indoor assistive system when GPS is not available or not accurate enough. The traditional hand-designed feature tracking and estimation approach would fail without visible features. Recently, there are several works exploring to use context features to do localization. However, all of these suffer severe accuracy loss if given no visual context information. To provide a possible solution to this problem, this paper proposes a camera system with both ego-downward and third-static view to perform localization and tracking in a learning approach. Besides, we also proposed a novel action and motion verification model for cross-view verification and localization. We performed comparative experiments based on our collected dataset which considers the same dressing, gender, and background diversity. Results indicate that the proposed model can achieve 18.32% improvement in accuracy performance. Eventually, we tested the model on multi-people scenarios and obtained an average 67.767% accuracy.

show abstract

Section: Results and Comparisonmentioning

confidence: 91%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Visual-GPS: Ego-Downward and Ambient Video Based Person Location Association

Yang

Jiang

Huo

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

show abstract

“…Our hypothesis is that simultaneous segmentation and matching is mutually beneficial: segmentation helps refine matching by producing finer-grained appearance features (compared to bounding boxes), which are important in crowded scenes with many occlusions, while matching helps locate a person of interest and produce better segmentation masks, which in turn help in tasks like activity and action recognition. We show that previous work [14] is a special case of ours, since we can naturally handle their first-and third-person cases. We evaluate on two publicly available datasets augmented with pixel-level annotations, showing that we achieve significantly better results than numerous baselines.…”

Section: Introductionmentioning

confidence: 84%

“…That paper's approach is applicable in closed settings with overhead cameras (e.g., a museum), but not in unconstrained environments such as our law enforcement example. Fan et al [14] relax many assumptions, allowing arbitrary third-person camera views and including evidence based on scene appearance. Zheng et al [43] consider the distinct problem of identifying the same person appearing in multiple wearable camera videos (but not trying to identify the camera wearers themselves).…”

Section: Introductionmentioning

confidence: 99%

Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos

Cheng

Wang

et al. 2018

Computer Vision – ECCV 2018

Self Cite

View full text Add to dashboard Cite

In a world of pervasive cameras, public spaces are often captured from multiple perspectives by cameras of different types, both fixed and mobile. An important problem is to organize these heterogeneous collections of videos by finding connections between them, such as identifying correspondences between the people appearing in the videos and the people holding or wearing the cameras. In this paper, we wish to solve two specific problems: (1) given two or more synchronized third-person videos of a scene, produce a pixel-level segmentation of each visible person and identify corresponding people across different views (i.e., determine who in camera A corresponds with whom in camera B), and (2) given one or more synchronized third-person videos as well as a first-person video taken by a mobile or wearable camera, segment and identify the camera wearer in the third-person videos. Unlike previous work which requires ground truth bounding boxes to estimate the correspondences, we perform person segmentation and identification jointly. We find that solving these two problems simultaneously is mutually beneficial, because better fine-grained segmentation allows us to better perform matching across views, and information from multiple views helps us perform more accurate segmentation. We evaluate our approach on two challenging datasets of interacting people captured from multiple wearable cameras, and show that our proposed method performs significantly better than the stateof-the-art on both person segmentation and identification.

show abstract

Integrating Egocentric Videos in Top-View Surveillance Videos: Joint Identification and Temporal Alignment

Ardeshir

Borji

2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

Identifying First-Person Camera Wearers in Third-Person Videos

Cited by 43 publications

References 27 publications

Visual-GPS: Ego-Downward and Ambient Video Based Person Location Association

Visual-GPS: Ego-Downward and Ambient Video Based Person Location Association

Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos

Integrating Egocentric Videos in Top-View Surveillance Videos: Joint Identification and Temporal Alignment

Contact Info

Product

Resources

About