2017
DOI: 10.4218/etrij.17.0116.0054
|View full text |Cite
|
Sign up to set email alerts
|

Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding

Abstract: For video understanding, namely analyzing who did what in a video, actions along with objects are primary elements. Most studies on actions have handled recognition problems for a well-trimmed video and focused on enhancing their classification performance. However, action detection, including localization as well as recognition, is required because, in general, actions intersect in time and space. In addition, most studies have not considered extensibility for a newly added action that has been previously tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 44 publications
0
9
0
Order By: Relevance
“…As wearable devices are widely used nowadays, Human Activity Recognition (HAR) has become an emerging research area in mobile and wearable computing. Recognizing human activity leads to a deep understanding of individuals’ activity patterns or long-term habits, which contributes to developing a wide range of user-centric applications such as human-computer interaction [1,2], surveillance [3,4], video streaming [5,6], AR/VR [7], and healthcare systems [8,9]. Although activity recognition has been investigated to a great extent in computer vision area [10,11,12], the application is limited to a certain scenario that equips pre-installed cameras with sufficient resolution and guaranteed angle of view.…”
Section: Introductionmentioning
confidence: 99%
“…As wearable devices are widely used nowadays, Human Activity Recognition (HAR) has become an emerging research area in mobile and wearable computing. Recognizing human activity leads to a deep understanding of individuals’ activity patterns or long-term habits, which contributes to developing a wide range of user-centric applications such as human-computer interaction [1,2], surveillance [3,4], video streaming [5,6], AR/VR [7], and healthcare systems [8,9]. Although activity recognition has been investigated to a great extent in computer vision area [10,11,12], the application is limited to a certain scenario that equips pre-installed cameras with sufficient resolution and guaranteed angle of view.…”
Section: Introductionmentioning
confidence: 99%
“…To resolve this issue, a two-stream CNNs model was proposed by Simnonyan and Zisserman [27] to explicitly take into account both spatial and temporal information in a single end-to-end learning framework. [28] ActivityNet [29] ActionNet-VE [30] UCF101 [31] THUMOS'14 [32] Something-Something v2 [33] VIVA Hand Gestures [34] UTD-MHAD [35] EgoGesture [36] AVA 2.1 [37] Something-Something v1 [38] Charades [39] NTU RGB+D 120 [40] miniSports [41] Sports-1M [26] IRD [42,43] HMDB-51 [44] ICVL-4 [42,43] NTU RGB+D 60 [45] Jester [46] 2014 [47] PDF 2016 [48] LSTM 75 2017 [49] Ontology/Rule 90 2017 [50] i3D CNN 98 2018 [51] CNN 37 2018 [33] 2D CNN 66 2018 [52] 3D CNN 86 93 2018 [43] i3D CNN 92 94 2019 [53] iD CNN+Attention 27 2019 [38] 2D…”
Section: Multi-resolutionmentioning
confidence: 99%
“…Stacked Fisher vectors were used by Peng et al [79] that further improved iDT. However, Wang et al [47] have used a variational Bayes method, while Moon et al [49], through an ontology and rule-based methodology, have produced benchmark action recognition techniques on RGB-based datasets.…”
Section: Rgb-data-based Techniquesmentioning
confidence: 99%
“…Furthermore, activity recognition has been widely reported in many fields using sensor modalities, including ambient sensors [35], wearable sensors [36], smart phones [34], and smart watches [37]. Those sensors contribute to developing a wide range of application domains such as sport [38], human-computer interaction [39], surveillance [40], video streaming [41], healthcare system [42], and computer vision area [43]. Due to the properties of noninvasive sensors, some studies discussed how to monitor human activities using this type of sensors (i.e., non-visual sensors) because they are both easy to install and privacy preserving [44,45].…”
Section: Activity Recognition-based Supervised Learningmentioning
confidence: 99%