TResNet: High Performance GPU-Dedicated Architecture

Ridnik, Tal; Lawen, Hussam; Noy, Asaf; Baruch, Emanuel Ben; Sharir, Gilad; Friedman, Itamar

doi:10.48550/arxiv.2003.13630

Cited by 19 publications

(29 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We would like to highlight a few interesting observations: (Radosavovic et al, 2020) 81.7% 39M 8B 21 -RegNetY-16GF (Radosavovic et al, 2020) 82.9% 84M 16B 32 -ResNeSt-101 (Zhang et al, 2020) 83.0% 48M 13B 31 -ResNeSt-200 (Zhang et al, 2020) 83.9% 70M 36B 76 -ResNeSt-269 (Zhang et al, 2020) 84.5% 111M 78B 160 -TResNet-L (Ridnik et al, 2020) 83.8% 56M -45 -TResNet-XL (Ridnik et al, 2020) 84.3% 78M -66 -EfficientNet-X (Li et al, 2021) 84.7% 73M 91B --NFNet-F0 (Brock et al, 2021) 83.6% 72M 12B 30 8.9 NFNet-F1 (Brock et al, 2021) 84.7% 133M 36B 70 20 NFNet-F2 (Brock et al, 2021) 85.1% 194M 63B 124 36 NFNet-F3 (Brock et al, 2021) 85.7% 255M 115B 203 65 NFNet-F4 (Brock et al, 2021) 85.9% 316M 215B 309 126 ResNet-RS 84.4% 192M 128B -61 LambdaResNet-420-hybrid 84.9% 125M --67 BotNet-T7-hybrid (Srinivas et al, 2021) 84.7% 75M 46B -95 BiT-M-R152x2 (21k) (Kolesnikov et al, 2020) 85.2% 236M 135B 500 -…”

Section: Imagenet21kmentioning

confidence: 97%

“…More recent works aim to improve training or inference speed instead of parameter efficiency. For example, RegNet (Radosavovic et al, 2020), ResNeSt (Zhang et al, 2020), TResNet (Ridnik et al, 2020), and EfficientNet-X (Li et al, 2021) focus on GPU and/or TPU inference speed; Lambda Networks , NFNets (Brock et al, 2021), BoTNets (Srinivas et al, 2021), ResNet-RS focus on TPU training speed. However, their training speed often comes with the cost of more parameters.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

EfficientNetV2: Smaller Models and Faster Training

Tan¹,

Le²

2021

Preprint

175

138

View full text Add to dashboard Cite

This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency. The models were searched from the search space enriched with new ops such as Fused-MBConv. Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.Our training can be further sped up by progressively increasing the image size during training, but it often causes a drop in accuracy. To compensate for this accuracy drop, we propose an improved method of progressive learning, which adaptively adjusts regularization (e.g., dropout and data augmentation) along with image size.With progressive learning, our EfficientNetV2 significantly outperforms previous models on Im-ageNet and CIFAR/Cars/Flowers datasets. By pretraining on the same ImageNet21k, our Effi-cientNetV2 achieves 87.3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2.0% accuracy while training 5x-11x faster using the same computing resources. Code will be available at https://github.com/ google/automl/efficientnetv2.

show abstract

Section: Imagenet21kmentioning

confidence: 97%

Section: Related Workmentioning

confidence: 99%

EfficientNetV2: Smaller Models and Faster Training

Tan¹,

Le²

2021

Preprint

175

138

View full text Add to dashboard Cite

show abstract

“…ResNet [13] is one of the most popular image classification architectures. It was a noteworthy improvement at the time it was introduced and continues to serve as the referent architecture for some analysis [8,55,56], or as a baseline in papers introducing new architectures [32,35,51,57].…”

Section: Related Workmentioning

confidence: 99%

ResNet strikes back: An improved training procedure in timm

Wightman¹,

Touvron²

2021

Preprint

112

View full text Add to dashboard Cite

The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & dataaugmentation have increased the effectiveness of the training recipes.In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4% top-1 accuracy at resolution 224×224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure.

show abstract

“…MS COCO is an 80 class dataset where each image may have several labels because it contains several objects. Following the development of [2], we use TResNet as the base model [51], and threshold the vector of softmax probabilities so that the FDR is controlled at a user-specified level α. To set the threshold, we choose λ as in Algorithm 1, using 4,000 calibration points, and then we evaluate the FDR on an additional test set of 1,000 points.…”

Section: An Alternative Approach: Uniform Concentrationmentioning

confidence: 99%

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Angelopoulos¹,

Bates²,

Candès³

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce Learn then Test, a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees regardless of the underlying model and (unknown) datagenerating distribution. The framework addresses, among other examples, false discovery rate control in multilabel classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. To accomplish this, we solve a key technical challenge: the control of arbitrary risks that are not necessarily monotonic. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.

show abstract

TResNet: High Performance GPU-Dedicated Architecture

Cited by 19 publications

References 21 publications

EfficientNetV2: Smaller Models and Faster Training

EfficientNetV2: Smaller Models and Faster Training

ResNet strikes back: An improved training procedure in timm

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Contact Info

Product

Resources

About