Graph-Based Global Reasoning Networks

Chen, Yunpeng; Rohrbach, Marcus; Yan, Zhicheng; Yan, Shuicheng; Feng, Jiashi; Kalantidis, Yannis

doi:10.1109/cvpr.2019.00052

Cited by 483 publications

(288 citation statements)

References 28 publications

Supporting

Mentioning

270

Contrasting

Order By: Relevance

“…Our ip-CSN-152, pre-trained on Sports1M outperforms I3D [3], R(2+1)D [32], and S3D-G [40] by 8.1%, 4.9%, and 4.5%, respectively. It also outperforms recent work: A 2 -Net [4] by 4.6%, Globalreasoning networks [6] by 3.1%. We note that our ip-CSN-152 achieves higher accuracy than both I3D with Non-local Networks (NL) [37] and SlowFast [10] (+1.5% and +0.3%) while being also faster (3.3x and 2x, respectively).…”

Section: Comparison With the State-of-the-artmentioning

confidence: 64%

Video Classification With Channel-Separated Convolutional Networks

Tran

Wang

Feiszli

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

532

343

View full text Add to dashboard Cite

Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks. Our experiments suggest two main findings. First, it is a good practice to factorize 3D convolutions by separating channel interactions and spatiotemporal interactions as this leads to improved accuracy and lower computational cost. Second, 3D channel-separated convolutions provide a form of regularization, yielding lower training accuracy but higher test accuracy compared to 3D convolutions. These two empirical findings lead us to design an architecture -Channel-Separated Convolutional Network (CSN) -which is simple, efficient, yet accurate. On Sports1M, Kinetics, and Something-Something, our CSNs are comparable with or better than the state-of-the-art while being 2-3 times more efficient.

show abstract

Section: Comparison With the State-of-the-artmentioning

confidence: 64%

Video Classification With Channel-Separated Convolutional Networks

Tran

Wang

Feiszli

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

532

343

View full text Add to dashboard Cite

show abstract

“…In the above experiment, we have compared and shown that OctConv is complementary with a set of state-of-the-art CNNs [16,17,47,22,18,34,19]. In this part, we compare OctConv with MG-Conv [25], GloRe [8], Elastic [43] and bL-Net [4] which share a similar idea as our method. Seven groups of results are shown in Table 4.…”

Section: Comparing With Sotas On Imagenetmentioning

confidence: 99%

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

Chen

Fan

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

513

295

View full text Add to dashboard Cite

In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation 1 to store and process feature maps that vary spatially "slower" at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-andplay convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.

show abstract

“…[15] proposed an efficient attention computation mechanism called Criss-Cross Network for semantic segmentation. [5] used the idea of bilateral filter to learn robust weighting model for object recognition. Besides, "attention" has also been proposed for image super-resolution and shown its great potential.…”

Section: Related Workmentioning

confidence: 99%

Image Super-Resolution via Attention Based Back Projection Networks

Liu

Wang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Deep learning based image Super-Resolution (SR) has shown rapid development due to its ability of big data digestion. Generally, deeper and wider networks can extract richer feature maps and generate SR images with remarkable quality. However, the more complex network we have, the more time consumption is required for practical applications. It is important to have a simplified network for efficient image SR. In this paper, we propose an Attention based Back Projection Network (ABPN) for image superresolution. Similar to some recent works, we believe that the back projection mechanism can be further developed for SR. Enhanced back projection blocks are suggested to iteratively update low-and high-resolution feature residues. Inspired by recent studies on attention models, we propose a Spatial Attention Block (SAB) to learn the cross-correlation across features at different layers. Based on the assumption that a good SR image should be close to the original LR image after down-sampling. We propose a Refined Back Projection Block (RBPB) for final reconstruction. Extensive experiments on some public and AIM2019 Image Super-Resolution Challenge [4] datasets show that the proposed ABPN can provide state-of-the-art or even better performance in both quantitative and qualitative measurements.

show abstract

Graph-Based Global Reasoning Networks

Cited by 483 publications

References 28 publications

Video Classification With Channel-Separated Convolutional Networks

Video Classification With Channel-Separated Convolutional Networks

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

Image Super-Resolution via Attention Based Back Projection Networks

Contact Info

Product

Resources

About