Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

Wu, Hanbo; Ma, Xin; Li, Yibin

doi:10.1109/tmm.2019.2953814

Cited by 29 publications

(12 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this sense, a new line of approaches is also emerging, namely the use of transformers (Girdhar et al, 2019;Liu et al, 2019) and attention mechanisms (Ke et al, 2019;Qiao et al, 2020;Wu et al, 2020). Commonly for fine-grained action recognition, the frame or sequence of frames incorporates irrelevant or redundant information, with no discriminatory property.…”

Section: Vision-based DL Methods For Har/hapmentioning

confidence: 99%

“…So, these algorithms guide the model to use attentional regions, instead of the whole frame, enhance local features and attain selectively feature fusion. For example, Wu et al (2020) implemented channel-wise and spatial attention mechanisms, along with baseline CNNs (VGG16 and ResNet-50) and LSTM. Additionally, when comparing to LSTM, transformers can be a lighter and maybe more suitable alternative for online performances (Kozlov et al, 2020).…”

Section: Vision-based DL Methods For Har/hapmentioning

confidence: 99%

See 1 more Smart Citation

Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation

Gonçalves¹,

Lopes²,

Moccia³

et al. 2023

Preprint

View full text Add to dashboard Cite

Gait disabilities are among the most frequent impairments worldwide. Their treatment increasingly relies on rehabilitation therapies, in which smart walkers are being introduced to empower the user's recovery state and autonomy, while reducing the clinicians effort. For that, these should be able to decode human motion and needs, as early as possible. Current walkers decode motion intention using information gathered from wearable or embedded sensors, namely inertial units, force sensors, hall sensors, and lasers, whose main limitations imply an expensive solution or hinder the perception of human movement. Smart walkers commonly lack an advanced and seamless human-robot interaction, which intuitively and promptly understands human motions. A contactless approach is proposed in this work, addressing human motion decoding as an early action recognition/detection problematic, using RGB-D cameras. We studied different deep learning-based algorithms, organised in three different approaches, to process lower body RGB-D video sequences, recorded from an embedded camera of a smart walker, and classify them into 4 classes (stop, walk, turn right/left). A custom dataset involving 15 healthy participants walking with the walker device was acquired and prepared, resulting in 28800 balanced RGB-D frames, to train and evaluate the deep learning networks. The best results were attained by a convolutional neural network with a channel-wise attention mechanism, reaching accuracy values of 99.61% and above 93%, for offline early detection/recognition and trial simulations, respectively. Following the hypothesis that human lower body features encode prominent information, fostering a more robust prediction towards real-time applications, the algorithm focus was also quantitatively evaluated using Dice metric, leading to values slightly higher than 30%. Promising results were attained for early action detection as a human motion decoding strategy, with enhancements in the focus of the proposed architectures.

show abstract

Section: Vision-based DL Methods For Har/hapmentioning

confidence: 99%

Section: Vision-based DL Methods For Har/hapmentioning

confidence: 99%

Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation

Gonçalves¹,

Lopes²,

Moccia³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Liu et al [65] explored residual and squeeze-and-excitation structures for feature extraction and proposed context beam search to integrate the Transformerbased [63] language model into CTC-based methods. The attention mechanism, which has been widely adopted in scene text recognition [35], [30], [67], action recognition [37], [38], and video processing [39], can also be applied to offline HCTR. Xiu et al [40] improved the attention-based decoder by a multi-level multi-modal fusion network.…”

Section: A Offline Handwritten Chinese Text Recognitionmentioning

confidence: 99%

Recognition of Handwritten Chinese Text by Segmentation: A Segment-Annotation-Free Approach

Peng

Jin

et al. 2023

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Online and offline handwritten Chinese text recognition (HTCR) has been studied for decades. Early methods adopted oversegmentation-based strategies but suffered from low speed, insufficient accuracy, and high cost of character segmentation annotations. Recently, segmentation-free methods based on connectionist temporal classification (CTC) and attention mechanism, have dominated the field of HCTR. However, people actually read text character by character, especially for ideograms such as Chinese. This raises the question: are segmentation-free strategies really the best solution to HCTR? To explore this issue, we propose a new segmentation-based method for recognizing handwritten Chinese text that is implemented using a simple yet efficient fully convolutional network. A novel weakly supervised learning method is proposed to enable the network to be trained using only transcript annotations; thus, the expensive character segmentation annotations required by previous segmentation-based methods can be avoided. Owing to the lack of context modeling in fully convolutional networks, we propose a contextual regularization method to integrate contextual information into the network during the training stage, which can further improve the recognition performance. Extensive experiments conducted on four widely used benchmarks, namely CASIA-HWDB, CASIA-OLHWDB, ICDAR2013, and SCUT-HCCDoc, show that our method significantly surpasses existing methods on both online and offline HCTR, and exhibits a considerably higher inference speed than CTC/attention-based approaches.

show abstract

“…Recent years have witnessed rapid growth of deep learning models, especially deep convolutional neural networks (CNN). With the successful applications of CNN in other low-level vision tasks like super resolution [41] and image denoising [50], CNN have also been widely used in SIRR problem [10], [13]- [15], [26], [34], [35], [42], [46], [49]. People usually deploy functional blocks from network structures such as residual nets [17], [35], dense nets [10], [20] and squeezeand-excitation nets [19], [26] to enhance their network.…”

Section: Introductionmentioning

confidence: 99%

Non-local Channel Aggregation Network for Single Image Rain Removal

Su¹,

Zhang²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Rain streaks showing in images or videos would severely degrade the performance of computer vision applications. Thus, it is of vital importance to remove rain streaks and facilitate our vision systems. While recent convolutinal neural network based methods have shown promising results in single image rain removal (SIRR), they fail to effectively capture longrange location dependencies or aggregate convolutional channel information simultaneously. However, as SIRR is a highly illposed problem, these spatial and channel information are very important clues to solve SIRR. First, spatial information could help our model to understand the image context by gathering long-range dependency location information hidden in the image. Second, aggregating channels could help our model to concentrate on channels more related to image background instead of rain streaks. In this paper, we propose a non-local channel aggregation network (NCANet) to address the SIRR problem. NCANet models 2D rainy images as sequences of vectors in three directions, namely vertical direction, transverse direction and channel direction. Recurrently aggregating information from all three directions enables our model to capture the longrange dependencies in both channels and spaitials locations. Extensive experiments on both heavy and light rain image data sets demonstrate the effectiveness of the proposed NCANet model.

show abstract

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

Cited by 29 publications

References 49 publications

Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation

Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation

Recognition of Handwritten Chinese Text by Segmentation: A Segment-Annotation-Free Approach

Non-local Channel Aggregation Network for Single Image Rain Removal

Contact Info

Product

Resources

About