TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

Lin, Song; Zhang, Shiwei; Yu, Gang; Sun, Hongbin

doi:10.1109/cvpr.2019.01226

Cited by 60 publications

(30 citation statements)

References 25 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their effect has been confirmed in other fields [25][26][27][28][29][30][31][32][33][34][35]39], such as object detection, anomaly detection and so on. Multi-scale modules [25][26][27][28][29] use different scale branches to collect complementary and different levels of ice sheet radar image features and merge them into multi-scale features to remedy the problem of poor feature extraction ability with a single-scale method. The attention modules [30][31][32][33][34] assign weights to different types of features from the global perspective to suppress noise, refine important ice boundary features and fit boundaries in radar topology sequences.…”

Section: Related Workmentioning

confidence: 92%

“…Therefore, to make up for the problems, the following two aspects are considered in our study. In order to further improve the representation of the features of ice layers adjacent regions, multi-scale features [25][26][27][28][29] can be used to extract more abundant scale features of ice layers. In addition, an attention mechanism can also be used to capture long-term relationships between ice layers and fuse the context information of radar images by paying attention to the features of the ice sheet radar images at different levels [30][31][32][33][34].…”

Section: Motivationmentioning

confidence: 99%

“…The approximate positions and the paths for collecting the dataset used in this paper are shown in Figure 1. There are 102 segments (a total of 5 profiles) containing basal terrain data of the Canadian Arctic Archipelago (CAA) (66 • N-83 • N, 61 • W-97 • W), each of which is about 50 km of 3D topological sequence [28] and can be used to reconstruct the 3D under-ice terrain of the target area. Three antenna subarrays, each of which contains 5 parallel antenna elements, are installed on the left wing, right wing and bottom of the fuselage.…”

Section: Data and Data Collection Processmentioning

confidence: 99%

See 2 more Smart Citations

Attention Multi-Scale Network for Automatic Layer Extraction of Ice Radar Topological Sequences

et al. 2021

View full text Add to dashboard Cite

Analyzing the surface and bedrock locations in radar imagery enables the computation of ice sheet thickness, which is important for the study of ice sheets, their volume and how they may contribute to global climate change. However, the traditional handcrafted methods cannot quickly provide quantitative, objective and reliable extraction of information from radargrams. Most traditional handcrafted methods, designed to detect ice-surface and ice-bed layers from ice sheet radargrams, require complex human involvement and are difficult to apply to large datasets, while deep learning methods can obtain better results in a generalized way. In this study, an end-to-end multi-scale attention network (MsANet) is proposed to realize the estimation and reconstruction of layers in sequences of ice sheet radar tomographic images. First, we use an improved 3D convolutional network, C3D-M, whose first full connection layer is replaced by a convolution unit to better maintain the spatial relativity of ice layer features, as the backbone. Then, an adjustable multi-scale module uses different scale filters to learn scale information to enhance the feature extraction capabilities of the network. Finally, an attention module extended to 3D space removes a redundant bottleneck unit to better fuse and refine ice layer features. Radar sequential images collected by the Center of Remote Sensing of Ice Sheets in 2014 are used as training and testing data. Compared with state-of-the-art deep learning methods, the MsANet shows a 10% reduction (2.14 pixels) on the measurement of average mean absolute column-wise error for detecting the ice-surface and ice-bottom layers, runs faster and uses approximately 12 million fewer parameters.

show abstract

Section: Related Workmentioning

confidence: 92%

Section: Motivationmentioning

confidence: 99%

Section: Data and Data Collection Processmentioning

confidence: 99%

See 1 more Smart Citation

Attention Multi-Scale Network for Automatic Layer Extraction of Ice Radar Topological Sequences

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Compared the method two-stream and 3D convolution, two-stream is more accurate but less efficient than the latter. How to take the advantages of both of them better is a possible research direction in b) Action detection will extend from temporal action detection to spatio-temporal action detection [39]. That is to say, we should detect from one-dimensional temporal interval to two-dimensional spatio-temporal box that can detect actions more comprehensively.…”

Section: Future Directions and Trendsmentioning

confidence: 99%

A Survey on Temporal Action Localization

Xia

Zhan

2020

IEEE Access

View full text Add to dashboard Cite

Temporal action localization is one of the most crucial and challenging problems for video understanding in computer vision. It has received a lot of attention in recent years because of the extensive application of daily life. Temporal action localization has made some significant progress, especially with the development of deep learning recently. And more demand is for temporal action localization in untrimmed videos. In this paper, our target is to survey the state-of-the-art techniques and models for video temporal action localization. It mainly includes the related techniques, some benchmark datasets and the evaluation metrics of temporal action localization. In addition, we summarize temporal action localization from two aspects: fully-supervised learning and weakly-supervised learning. And we list several representative works and compare their performances respectively. Finally, we make some deep analysis and propose potential research directions, and conclude the survey.

show abstract

“…They utilize the Sobel operator and element-wise subtraction to calculate the spatial and temporal gradients respectively. Song et al [26] propose Discriminative Motion Cue (DMC) to reduce noises in motion vectors and capture fine motion details. They train the DMC generator to approximate flow using a reconstruction loss and an adversarial loss, jointly with the downstream action classification task.…”

Section: B Spatiotemporal Two-streammentioning

confidence: 99%

Multimodal Spatiotemporal Networks for Sign Language Recognition

Zhang

Meng

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Different from other human behaviors, sign language has the characteristics of limited local motion of upper limb and meticulous hand action. Some sign language gestures are ambiguous in RGB video due to the influence of lighting and background color, which affects the recognition accuracy. We propose a multimodal deep learning architecture for sign language recognition which effectively combines RGB-D input and two-stream spatiotemporal networks. Depth videos, as an effective compensation of RGB input, can supply additional distance information about the signer's hands. A novel sampling method called ARSS (Aligned Random Sampling in Segments) is put forward to select and align optimal RGB-D video frames, which improves the capacity utilization of multimodal data and reduces the redundancy. We get the hand ROI by joints information of RGB data for local focus in spatial stream. D-shift Net is proposed as depth motion feature extraction in temporal stream, which fully utilizes three dimensional motion information of the sign language. Both streams are fused by convolutional fusion layer to get complementary features. Our approach explored the multimodal information and enhanced the recognition precision. It obtains the state-the-of-art performance on the datasets of CSL (96.7%) and IsoGD (63.78%). INDEX TERMS Sign language recognition, two-stream network, motion features, multimodal data.

show abstract

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

Cited by 60 publications

References 25 publications

Attention Multi-Scale Network for Automatic Layer Extraction of Ice Radar Topological Sequences

Attention Multi-Scale Network for Automatic Layer Extraction of Ice Radar Topological Sequences

A Survey on Temporal Action Localization

Multimodal Spatiotemporal Networks for Sign Language Recognition

Contact Info

Product

Resources

About