Action and Interaction Recognition in First-Person Videos

Narayan, Sanath; Kankanhalli, Mohan; Ramakrishnan, K. R.

doi:10.1109/cvprw.2014.82

Cited by 41 publications

(35 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considering the intrinsic differences between first and third-person videos, several methods have been specifically proposed for first person viewpoint videos [7,[20][21][22][23][24][25]. In [7], the combination of the local and the global features using a multi-channel kernel is investigated.…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, [7] proposed to explicitly consider temporal structure using a hierarchical structure learning. Narayan et al extend improved trajectory approach [11] by grouping trajectories using a motion pyramidal structure [22]. Kitani et al [21] proposed a framework for ego-centric videos by using a stacked Dirichlet process mixture model to automatically learn a motion codebook and ego-action categories.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A correlation based feature representation for first-person activity recognition

Kahani

Talebpour

Mahmoudi-Aznaveh

2019

Multimed Tools Appl

View full text Add to dashboard Cite

In this paper, a simple yet efficient activity recognition method for first-person video is introduced. The proposed method is appropriate for representation of high-dimensional features such as those extracted from convolutional neural networks (CNNs). The per-frame (per-segment) extracted features are considered as a set of time series, and inter and intra-time series relations are employed to represent the video descriptors. To find the inter-time relations, the series are grouped and the linear correlation between each pair of groups is calculated. The relations between them can represent the scene dynamics and local motions. The introduced grouping strategy helps to considerably reduce the computational cost. Furthermore, we split the series in temporal direction in order to preserve long term motions and better focus on each local time window. In order to extract the cyclic motion patterns, which can be considered as primary components of various activities, intra-time series correlations are exploited. The representation method results in highly discriminative features which can be linearly classified. The experiments confirm that our method outperforms the state-of-the-art methods on recognizing first-person activities on the two challenging first-person datasets. Index Terms-Human activity recognition; first-person activity recognition; feature encoding; feature representation; convolutional neural network. I. INTRODUCTION Human action recognition have become an interesting research filed in the recent decade [1-6]. It is because of its numerous applications, such as visual surveillance, entertainment devices, elderly people assistance, human-computer interaction, and video indexing/retrieval. In spite of many efforts conducted on recognition of human activities, it still remains a difficult problem in real-world applications. Intrinsic similarities between different actions give small inter-class variations. On the other hand, there are large intra-class variations caused by camera motion, illumination changes, background clutter, viewpoint changes, irrelevant motions, and various styles/speeds.The videos taken from an actor's own viewpoint are called first-person videos. Although a lot of research have been conducted on third-person activity recognition, it is not appropriate to directly employ them for first-person videos. It is due to major differences between these two kinds of videos. The main difference is related to the fact that the person wearing the camera is involved in the activity. As a consequence, strong ego-motion is mostly occurred in this kind of videos. It should be noted that in most of the first-person video analysis, a real time response is required; therefore, the computational complexity should be considered more intensively [7].In recent years, the number of captured videos in first-person viewpoint has rapidly grown due to increasing wearable cameras [8]. A lot of applications have emerged such as life logging, elderly (or blind) people assistance, military applications, and ro...

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A correlation based feature representation for first-person activity recognition

Kahani

Talebpour

Mahmoudi-Aznaveh

2019

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…One of the main areas investigated so far is activity recognition [9], where the focus is the assessment of interactions between the wearer and the environment, paying particular attention on the manipulation of objects and hands movements as in [3] and [14].…”

Section: Previous Workmentioning

confidence: 99%

Convolutional Neural Networks for Detecting and Mapping Crowds in First Person Vision Applications

Olier

Regazzoni

Rauterberg

2015

Advances in Computational Intelligence

View full text Add to dashboard Cite

Abstract. There has been an increasing interest on the analysis of First Person Videos in the last few years due to the spread of low-cost wearable devices. Nevertheless, the understanding of the environment surrounding the wearer is a difficult task with many elements involved. In this work, a method for detecting and mapping the presence of people and crowds around the wearer is presented. Features extracted at the crowd level are used for building a robust representation that can handle the variations and occlusion of people's visual characteristics inside a crowd. To this aim, convolutional neural networks have been exploited. Results demonstrate that this approach achieves a high accuracy on the recognition of crowds, as well as the possibility of a general interpretation of the context trough the classification of characteristics of the segmented background.

show abstract

“…Application domains that employ wearable cameras ( Fig. 1 ) include life-logging and video summarization [3][4][5][6][7] , activity recognition [8][9][10][11][12][13][14][15][16][17][18][19][20][21] , and eye-tracking and gaze detection [22][23][24][25] . Human activities can be categorized as ambulatory (e.g., walk) [8][9][10][11][12][13][14][15] ; person-to-object interactions (e.g., cook) [16][17][18][19] ; and person-to-person interactions (e.g., handshake) [20,21] .…”

Section: Introductionmentioning

confidence: 99%

Robust multi-dimensional motion features for first-person vision activity recognition

Abebe

Cavallaro

Parra

2016

Computer Vision and Image Understanding

View full text Add to dashboard Cite

We propose robust multi-dimensional motion features for human activity recognition from first-person videos. The proposed features encode information about motion magnitude, direction and variation, and combine them with virtual inertial data generated from the video itself. The use of grid flow representation, per-frame normalization and temporal feature accumulation enhances the robustness of our new representation. Results on multiple datasets demonstrate that the proposed feature representation outperforms existing motion features, and importantly it does so independently of the classifier. Moreover, the proposed multi-dimensional motion features are general enough to make them suitable for vision tasks beyond those related to wearable cameras. (C) 2015 The Authors. Published by Elsevier Inc.Peer ReviewedPostprint (published version

show abstract

Action and Interaction Recognition in First-Person Videos

Cited by 41 publications

References 15 publications

A correlation based feature representation for first-person activity recognition

A correlation based feature representation for first-person activity recognition

Convolutional Neural Networks for Detecting and Mapping Crowds in First Person Vision Applications

Robust multi-dimensional motion features for first-person vision activity recognition

Contact Info

Product

Resources

About