A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Gotmare, Akhilesh; Keskar, Nitish Shirish; Xiong, Caiming; Socher, Richard

doi:10.48550/arxiv.1810.13243

Cited by 48 publications

(47 citation statements)

References 18 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent tool that e ectively overcomes these challenges is (Singular Vector) Canonical Correlation Analysis, (SV)CCA [22,19], which has been used to study latent representations through training, across di erent models, alternate training objectives, and other properties [22,19,25,18,8,17,30].…”

Section: Representational Analysis Of the E Ects Of Transfermentioning

confidence: 99%

Transfusion: Understanding Transfer Learning for Medical Imaging

Raghu,

Zhang,

Kleinberg

et al. 2019

Preprint

View full text Add to dashboard Cite

Transfer learning from natural image datasets, particularly I N , using standard large models and corresponding pretrained weights has become a de-facto method for deep learning applications to medical imaging. However, there are fundamental di erences in data sizes, features and task speci cations between natural image classi cation and the target medical tasks, and there is little understanding of the e ects of transfer. In this paper, we explore properties of transfer learning for medical imaging. A performance evaluation on two large scale medical imaging tasks shows that surprisingly, transfer o ers little bene t to performance, and simple, lightweight models can perform comparably to I N architectures. Investigating the learned representations and features, we nd that some of the di erences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse. We isolate where useful feature reuse occurs, and outline the implications for more e cient model exploration. We also explore feature independent bene ts of transfer arising from weight scalings.

show abstract

Section: Representational Analysis Of the E Ects Of Transfermentioning

confidence: 99%

Transfusion: Understanding Transfer Learning for Medical Imaging

Raghu,

Zhang,

Kleinberg

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Then we conduct the meta update on MetaTTE and calculate θ f (Line 11). Similar to Reptile, we develop the adaptive algorithm as a linear learning rate scheduler [29] formulated as:…”

Section: Meta Learning Based Optimization Algorithmmentioning

confidence: 99%

Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Wang,

Zhao,

Zhang

et al. 2022

Preprint

View full text Add to dashboard Cite

Travel Time Estimation (TTE) is indispensable in intelligent transportation system (ITS). It is significant to achieve the fine-grained Trajectory-based Travel Time Estimation (TTTE) for multi-city scenarios, namely to accurately estimate travel time of the given trajectory for multiple city scenarios. However, it faces great challenges due to complex factors including dynamic temporal dependencies and fine-grained spatial dependencies. To tackle these challenges, we propose a meta learning based framework, MetaTTE, to continuously provide accurate travel time estimation over time by leveraging welldesigned deep neural network model called DED, which consists of Data preprocessing module and Encoder-Decoder network module. By introducing meta learning techniques, the generalization ability of MetaTTE is enhanced using small amount of examples, which opens up new opportunities to increase the potential of achieving consistent performance on TTTE when traffic conditions and road networks change over time in the future. The DED model adopts an encoder-decoder network to capture fine-grained spatial and temporal representations. Extensive experiments on two real-world datasets are conducted to confirm that our MetaTTE outperforms six state-of-art baselines, and improve 29.35% and 25.93% accuracy than the best baseline on Chengdu and Porto datasets, respectively.

show abstract

“…We apply a warm-up schedule on the learning rate for 10% of the total epochs (i.e. for 5 epochs) and reach a maximum learning rate of 0.3, after which we apply cosine decay to the learning rate [13].…”

Section: A Self-supervised Feature Learningmentioning

confidence: 99%

Self-Supervised Feature Learning of 1D Convolutional Neural Networks with Contrastive Loss for Eating Detection Using an In-Ear Microphone

Papapanagiotou¹,

Diou²,

Delopoulos³

2021

Preprint

View full text Add to dashboard Cite

The importance of automated and objective monitoring of dietary behavior is becoming increasingly accepted. The advancements in sensor technology along with recent achievements in machine-learning-based signal-processing algorithms have enabled the development of dietary monitoring solutions that yield highly accurate results. A common bottleneck for developing and training machine learning algorithms is obtaining labeled data for training supervised algorithms, and in particular ground truth annotations. Manual ground truth annotation is laborious, cumbersome, can sometimes introduce errors, and is sometimes impossible in free-living data collection. As a result, there is a need to decrease the labeled data required for training. Additionally, unlabeled data, gathered inthe-wild from existing wearables (such as Bluetooth earbuds) can be used to train and fine-tune eating-detection models. In this work, we focus on training a feature extractor for audio signals captured by an in-ear microphone for the task of eating detection in a self-supervised way. We base our approach on the SimCLR method for image classification, proposed by Chen et al. from the domain of computer vision. Results are promising as our self-supervised method achieves similar results to supervised training alternatives, and its overall effectiveness is comparable to current state-of-the-art methods. Code is available at https://github.com/mug-auth/ssl-chewing.1 Vasileios Papapanagiotou and Anastasios Delopoulos are with the Multimedia Understanding Group, Dpt. of Electrical and Computer Engineer-

show abstract

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Cited by 48 publications

References 18 publications

Transfusion: Understanding Transfer Learning for Medical Imaging

Transfusion: Understanding Transfer Learning for Medical Imaging

Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Self-Supervised Feature Learning of 1D Convolutional Neural Networks with Contrastive Loss for Eating Detection Using an In-Ear Microphone

Contact Info

Product

Resources

About