Temporal Segment Connection Network for Action Recognition

Li, Qian; Yang, Wenzhu; Chen, Xiangyang; Yuan, Tongtong; Wang, Yuxia

doi:10.1109/access.2020.3027386

Cited by 10 publications

(7 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the video descriptor from the whole video could not be extracted, this method might not suitable for videos with various durations. Temporal segment network (TSN) [ 24 ] was designed for capturing features from the whole frame sequences with their modified two-stream networks. Unlike a traditional method, the segment consensus function was added as a post-processing step.…”

Section: Input Data For Tapg Networkmentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Review on Temporal-Action Proposal Generation

Sooksatra

Watcharapinchai

2022

J. Imaging

View full text Add to dashboard Cite

Temporal-action proposal generation (TAPG) is a well-known pre-processing of temporal-action localization and mainly affects localization performance on untrimmed videos. In recent years, there has been growing interest in proposal generation. Researchers have recently focused on anchor- and boundary-based methods for generating action proposals. The main purpose of this paper is to provide a comprehensive review of temporal-action proposal generation with network architectures and empirical results. The pre-processing step for input data is also discussed for network construction. The content of this paper was obtained from the research literature related to temporal-action proposal generation from 2012 to 2022 for performance evaluation and comparison. From several well-known databases, we used specific keywords to select 71 related studies according to their contributions and evaluation criteria. The contributions and methodologies are summarized and analyzed in a tabular form for each category. The result from state-of-the-art research was further analyzed to show its limitations and challenges for action proposal generation. TAPG performance in average recall ranges from 60% up to 78% in two TAPG benchmarks. In addition, several future potential research directions in this field are suggested based on the current limitations of the related studies.

show abstract

Section: Input Data For Tapg Networkmentioning

confidence: 99%

“…Moreover, the base module was designed specifically for RGB and optical flow from video descriptor. The two-stream network [ 24 ] was utilized to extract the rich local temporal video representation as input, to exploit the rich local behaviors within the video sequence.…”

Section: The Review Of Tapg Networkmentioning

confidence: 99%

A Comprehensive Review on Temporal-Action Proposal Generation

Sooksatra

Watcharapinchai

2022

J. Imaging

View full text Add to dashboard Cite

show abstract

“…Recently, many works for video classification [5], [16], [21], [23], [29], [32], [37], [42] have focused on an ability to model the temporal variation, dynamics of an action (i.e., visual tempo [42]), called temporal modeling in literature. Unlike 2D image classification, video classification should distinguish visual tempo variation as well as its semantic appearance.…”

Section: Introductionmentioning

confidence: 99%

Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification

Lee¹,

Kim²,

Yun³

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Video classification researches have recently attracted attention in the fields of temporal modeling and efficient 3D convolutional architectures. However, the temporal modeling methods are not efficient, and there is little interest in how to deal with temporal modeling in the 3D efficient architectures. To build an efficient 3D architecture for temporal modeling, we propose a new 3D backbone network, called VoV3D, that consists of a temporal one-shot aggregation (T-OSA) module and a depthwise factorized component, D(2+1)D. The T-OSA is devised to build a feature hierarchy by aggregating spatiotemporal features with different temporal receptive fields. Stacking this T-OSA enables the network itself to model short-range as well as long-range temporal relationships across frames without any external modules. We also design a depthwise spatiotemporal factorization module, D(2+1)D, that decomposes a 3D depthwise convolution into two spatial and temporal depthwise convolutions for efficient architecture. Through the proposed temporal modeling method (T-OSA) and the efficient factorization module (D(2+1)D), we construct two types of VoV3D networks: VoV3D-M and VoV3D-L. Thanks to its efficiency and effectiveness of their temporal modeling, VoV3D-L has 4× fewer model parameters and 14× less computation, surpassing the state-ofthe-art TEA model [22] on both Something-Something and Kinetics-400 datasets. Furthermore, VoV3D shows better performance for temporal modeling than the efficient X3D [7]. We hope that VoV3D can serve as a baseline for efficient temporal modeling architecture. The code and models are available at https://github.com/youngwanLEE/VoV3D.

show abstract

“…In modern society, pirated goods have become a global issue as a source of illicit funds for crime organizations. [ 1 ] Some countermeasures such as physical anti‐counterfeiting technologies have been developed to prevent replication in various forms: watermarks, [ 2,3 ] intaglio printing, [ 4 ] luminescent ink, [ 5,6 ] thermal emissive label, [ 7–10 ] and magnetic ink. [ 11,12 ] Most importantly, thermal emissive labels using tailored infrared (IR) emissivity implemented using photonic structures have received significant research attention as a promising anti‐counterfeiting candidate owing to their facile design and fabrication process for a textured metal surface such as photonic crystal cavities, [ 13–16 ] nano‐antennas, [ 17–19 ] metamaterials, [ 20–22 ] and gratings.…”

Section: Introductionmentioning

confidence: 99%

Colored, Covert Infrared Display through Hybrid Planar‐Plasmonic Cavities

Lee

Kim

Yoo

et al. 2021

Advanced Optical Materials

View full text Add to dashboard Cite

Artificial covert infrared (IR) displays have recently emerged as an anti‐counterfeiting method for spontaneous thermal emissive surfaces and optically encoded information. However, the unnatural appearance of a conventional thermal emissive label in the visible region limits the widespread application of an artificial covert IR display. This paper presents a colored, covert IR display exhibiting visible color patterns and thermally encoded data simultaneously based on a hybrid planar‐plasmonic cavity (HPPC). The HPPC is composed of two spectrally distinguished resonant structures: 1) an ultrathin planar cavity with an amorphous silicon (a‐Si) layer on gold (Au) for visible coloration and 2) an IR plasmonic cavity with hole‐patterned Au on a polymer substrate with a back mirror for thermal data encoding. Such hybridization of multi‐band resonance can not only enhance the vivid coloration but also enhance the data storage capacity per unit. Camouflage labels with encrypted thermal data are successfully demonstrated for practical applications using a flexible HPPC. Collectively, the proposed HPPC enables a new type of anti‐counterfeiting method that achieves both esthetic, visibly encoded data, and covert, thermally encoded data.

show abstract

Temporal Segment Connection Network for Action Recognition

Cited by 10 publications

References 29 publications

A Comprehensive Review on Temporal-Action Proposal Generation

A Comprehensive Review on Temporal-Action Proposal Generation

Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification

Colored, Covert Infrared Display through Hybrid Planar‐Plasmonic Cavities

Contact Info

Product

Resources

About