Dynamic Channel Selection in Self-Supervised Learning

Krishna, Tarun; Rai, Ayush; Djilali, Yasser; Smeaton, Alan F.; McGuinness, Kevin; O’Connor, Noel E.

doi:10.56541/lkli8696

Cited by 2 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To make training feasible the Gumbel-softmax trick (Jang et al, 2016) is adopted. The Gumbel-Trick has been widely used as a reparameterisation technique for the task of dynamic channel selection (Krishna et al, 2022;Li et al, 2021;Herrmann et al, 2020;Veit & Belongie, 2018). For more clarity refer Figure 4 in Appendix A.3.…”

Section: Methodsmentioning

confidence: 99%

“…Most of the works on dynamic computation have been mostly confined to supervised learning. Recently, (Krishna et al, 2022) used SimSiam (Chen & He) as a selfsupervised objective combined with a dynamic channel gating (DGNet) (Li et al, 2021) mechanism trained from scratch, and showed that comparable performance can be achieved under channel budget constraints. Likewise (Meng et al, 2022) used a channel gating-based dynamic pruning (CGNet) (Hua et al, 2019) augmented with contrastive learning to achieve inference speed-ups without substantial loss of performance.…”

Section: Self-supervised Dynamic Computation and Beyondmentioning

confidence: 99%

“…A common practice to reduce this computational burden is to extract a lightweight sub-network 1 from an off-theshelf pre-trained model, or pre-training the model as a part of a multi-step training process and further compressing it by applying techniques like Knowledge distillation (KD) (Hinton et al, 2015), pruning (Frankle & Carbin, 2018), dynamic computation (DC) (Veit & Belongie, 2018), etc. SSL based pre-training combined with KD 2 (Tian et al, 2019;Abbasi Koohpayegani et al, 2020;Fang et al, 2021), DC (Krishna et al, 2022;Meng et al, 2022), or pruning (Caron et al, 2020; also serve as effective ways to obtain a lightweight sub-network for a given downstream task. This sequential learning procedure often involves finetuning a pre-trained self-supervised model on a downstream task along with the corresponding training objective of KD, DC, or pruning with cross entropy (CE) loss.…”

Section: Introductionmentioning

confidence: 99%

“…large language models (LLMs)) via fine-tuning makes the overall process computationally more expensive and cumbersome. Furthermore, downstream tasks are diverse and vary widely; therefore, any change in the downstream task usually requires repeating the entire procedure multiple (Krishna et al, 2022) but modified as per our use case (i.e, instead of SimSiam (Chen & He, 2021) we use VICReg objective). c. This work: we learn a dense encoder and some gates based on a budget constraint t d .…”

Section: Introductionmentioning

confidence: 99%

“…For obtaining W we follow dynamic channel selection (DCS) (Veit & Belongie, 2018;Li et al, 2021) to induce sparsity while maintaining network topology. Figure 1a depicts the traditional setting alternating between pre-training and fine-tuning while Figure 1b depicts the setting recently introduced in (Krishna et al, 2022) using dynamic channel selection along with self-supervision. Also from now on, lightweight, sub-network or gated network refer to same thing.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Unifying Synergies between Self-supervised Learning and Dynamic Computation

Krishna¹,

Rai²,

Drîmbarean³

et al. 2023

Preprint

View full text Add to dashboard Cite

Self-supervised learning (SSL) approaches have made major strides forward by emulating the performance of their supervised counterparts on several computer vision benchmarks. This, however, comes at a cost of substantially larger model sizes, and computationally expensive training strategies, which eventually lead to larger inference times making it impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweight subnetwork, which usually involves multiple epochs of fine-tuning of a large pre-trained model, making it more computationally challenging.In this work we propose a novel perspective on the interplay between SSL and DC paradigms that can be leveraged to simultaneously learn a dense and gated (sparse/lightweight) sub-network from scratch offering a good accuracy-efficiency tradeoff, and therefore yielding a generic and multipurpose architecture for application specific industrial settings. Our study overall conveys a constructive message: exhaustive experiments on several image classification benchmarks: CIFAR-10, STL-10, CIFAR-100, and ImageNet-100, demonstrates that the proposed training strategy provides a dense and corresponding sparse sub-network that achieves comparable (on-par) performance compared with the vanilla self-supervised setting, but at a significant reduction in computation in terms of FLOPs under a range of target budgets.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Self-supervised Dynamic Computation and Beyondmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Unifying Synergies between Self-supervised Learning and Dynamic Computation

Krishna¹,

Rai²,

Drîmbarean³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Multi-Path Routing for Conditional Information Gain Trellis Using Cross-Entropy Search and Reinforcement Learning

Bicici,

Akarun

2024

IEEE Access

View full text Add to dashboard Cite

Convolutional neural networks have made significant strides in solving computer-vision tasks at the expense of high computational demands. This complexity hinders efficient processing, particularly on devices with limited computational resources such as edge devices. One way to overcome this limitation is conditional computing, which optimizes inference by selectively utilizing parts of the network depending on the characteristics of the input. A recent conditional execution method is Conditional Information Gain Trellis (CIGT), which routes samples based on an information gain-based router mechanism. The original CIGT model was designed to route a single sample along a single path in a trellis structure. In this study, advanced inference strategies that allow inputs to traverse multiple paths are proposed to improve the performance of the vanilla CIGT model. These strategies aim to find a middle ground between improved model performance and increased computational demands. For this purpose, two techniques were proposed: A Cross-Entropy Search-based threshold optimization algorithm and a Reinforcement Learningbased routing strategy. The first method treats multi-path routing in CIGT as a black-box optimization problem and the second interprets it as a Markov Decision Process with a Q-Learning-based supervised regression algorithm designed as the solution. It has been shown that both of these methods provide significant performance improvements compared to the original CIGT model, with an adjustable increase in the computation. Experiments were conducted on two image datasets, with additional statistical tests and analyses to inspect the behaviors of the proposed algorithms. The novel methods designed in this study for multi-path routing show potential for both the original CIGT model and similar conditional computation approaches that use specific routing mechanisms for selecting network parts based on samples.

show abstract

Dynamic Channel Selection in Self-Supervised Learning

Cited by 2 publications

References 0 publications

Unifying Synergies between Self-supervised Learning and Dynamic Computation

Unifying Synergies between Self-supervised Learning and Dynamic Computation

Multi-Path Routing for Conditional Information Gain Trellis Using Cross-Entropy Search and Reinforcement Learning

Contact Info

Product

Resources

About