ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682698
|View full text |Cite
|
Sign up to set email alerts
|

Object and Text-guided Semantics for CNN-based Activity Recognition

Abstract: Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 22 publications
(20 reference statements)
0
2
0
Order By: Relevance
“…• Multiple task-specific layers. Multi-task learning, performing multiple tasks with different layers in a common learned space, is widely used and has presented better accuracy than using multiple single task networks [9], [48]- [51].…”
Section: B Building Cross-domain Pretrained Modelmentioning
confidence: 99%
“…• Multiple task-specific layers. Multi-task learning, performing multiple tasks with different layers in a common learned space, is widely used and has presented better accuracy than using multiple single task networks [9], [48]- [51].…”
Section: B Building Cross-domain Pretrained Modelmentioning
confidence: 99%
“…Examples of colearning algorithms include co-training, zero-shot learning and concept learning. Recent examples include: the use of information extraction in text processing to guide entity detection in computer vision algorithms, such as for visual object recognition, when an implicit instrument argument can be inferred for an event mentioned in a caption accompanying an image (Subburathinam et al, 2019) and multitask learning that exploits a text-guided semantic space to select the most relevant visual objects for novel visual activity recognition (Eum et al, 2019). The challenges that arise in multimodal processing may involve various combinations of these five categories.…”
Section: Levels Of Representation / Abstraction / Granularity / Align...mentioning
confidence: 99%