Residual Conv-Deconv Grid Network for Semantic Segmentation

Fourure, Damien; Emonet, Rémi; Fromont, Élisa; Muselet, Damien; Trémeau, Alain; Wolf, Christian

doi:10.48550/arxiv.1707.07958

Cited by 38 publications

(59 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3D CNN networks are the natural choices to capture spatiotemporal features among video frames. However, the existing architectures for pixel-wise tasks (e.g., UNet-3D [26]) adopt a singlestream Encoder-Decoder style architecture that aggregates multi-scale features by the process of sequential downsampling and skip-connection which may result in information loss [13]. Inspired by the success of GridNet [14,36] in efficiently incorporating multi-resolution features, we formulate a novel 3D version of GridNet namely "GridNet-3D" by replacing its 2D convolutional filters with 3D convolutional filters.…”

Section: Non-linear Motion Estimation (Nme) Modulementioning

confidence: 99%

“…However, the existing architectures for pixel-wise tasks (e.g., UNet-3D [26]) adopt a singlestream Encoder-Decoder style architecture that aggregates multi-scale features by the process of sequential downsampling and skip-connection which may result in information loss [13]. Inspired by the success of GridNet [14,36] in efficiently incorporating multi-resolution features, we formulate a novel 3D version of GridNet namely "GridNet-3D" by replacing its 2D convolutional filters with 3D convolutional filters. GridNet-3D consists of three parallel streams to capture features with different resolutions and each stream has five convolutional blocks arranged in a sequence as shown in Fig.…”

Section: Non-linear Motion Estimation (Nme) Modulementioning

confidence: 99%

“…Ft→1 is refined in a similar manner to obtain F r t→1 . We try with two types of motion refinement network in this work namely: 1) UNet-2D [45,50], and 2) GridNet-2D [11,14,36]. Finally, we choose GridNet-2D as the motion refinement network due to its superior performance (ref.…”

Section: Motion Refinement (Mr) Modulementioning

confidence: 99%

“…In this section, we perform comparative studies among the different choices available for NME (UNet-2D [24], UNet-3D [26], GridNet-3D) and MR (UNet-2D [45], GridNet-2D [14]) modules to determine the best performing configuration.…”

Section: Experiments On Model Configurationsmentioning

confidence: 99%

“…Choice of MR modules: We experiment with two types of motion refinement modules: UNet-2D [45] and GridNet-2D [14]. We use a standard encoder-decoder architecture with skip connections for UNet-2D.…”

Section: Experiments On Model Configurationsmentioning

confidence: 99%

See 4 more Smart Citations

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

Dutta¹,

Subramaniam²,

Mittal³

2022

Preprint

View full text Add to dashboard Cite

Video frame interpolation aims to synthesize one or multiple frames between two consecutive frames in a video. It has a wide range of applications including slow-motion video generation, frame-rate up-scaling and developing video codecs. Some older works tackled this problem by assuming per-pixel linear motion between video frames. However, objects often follow a non-linear motion pattern in the real domain and some recent methods attempt to model per-pixel motion by non-linear models (e.g., quadratic). A quadratic model can also be inaccurate, especially in the case of motion discontinuities over time (i.e. sudden jerks) and occlusions, where some of the flow information may be invalid or inaccurate.In our paper, we propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used. Specifically, we are able to softly switch between a linear and a quadratic model. Towards this end, we use an end-toend 3D CNN encoder-decoder architecture over bidirectional optical flows and occlusion maps to estimate the nonlinear motion model of each pixel. Further, a motion refinement module is employed to refine the non-linear motion and the interpolated frames are estimated by a simple warping of the neighboring frames with the estimated perpixel motion. Through a set of comprehensive experiments, we validate the effectiveness of our model and show that our method outperforms state-of-the-art algorithms on four datasets (Vimeo, DAVIS, HD and GoPro).

show abstract

Section: Non-linear Motion Estimation (Nme) Modulementioning

confidence: 99%

Section: Non-linear Motion Estimation (Nme) Modulementioning

confidence: 99%

Section: Motion Refinement (Mr) Modulementioning

confidence: 99%

Section: Experiments On Model Configurationsmentioning

confidence: 99%

Section: Experiments On Model Configurationsmentioning

confidence: 99%

See 3 more Smart Citations

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

Dutta¹,

Subramaniam²,

Mittal³

2022

Preprint

View full text Add to dashboard Cite

show abstract

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Zhou

Siddiquee

Tajbakhsh

et al. 2018

Lecture Notes in Computer Science

4,506

2,309

View full text Add to dashboard Cite

In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively.

show abstract

Neural Architecture Search for Dense Prediction Tasks in Computer Vision

et al. 2023

View full text Add to dashboard Cite

The success of deep learning in recent years has lead to a rising demand for neural network architecture engineering. As a consequence, neural architecture search (NAS), which aims at automatically designing neural network architectures in a data-driven manner rather than manually, has evolved as a popular field of research. With the advent of weight sharing strategies across architectures, NAS has become applicable to a much wider range of problems. In particular, there are now many publications for dense prediction tasks in computer vision that require pixel-level predictions, such as semantic segmentation or object detection. These tasks come with novel challenges, such as higher memory footprints due to high-resolution data, learning multi-scale representations, longer training times, and more complex and larger neural architectures. In this manuscript, we provide an overview of NAS for dense prediction tasks by elaborating on these novel challenges and surveying ways to address them to ease future research and application of existing methods to novel problems.

show abstract

Residual Conv-Deconv Grid Network for Semantic Segmentation

Cited by 38 publications

References 0 publications

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Neural Architecture Search for Dense Prediction Tasks in Computer Vision

Contact Info

Product

Resources

About