PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

Ding, Nan; Chen, Xi; Levinboim, Tomer; Changpinyo, Soravit; Soricut, Radu

doi:10.1007/978-3-031-19830-4_15

Cited by 8 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the context of meta-learning, PAC-Bayesian theory is extensively studied to provide guarantees for generalization errors (Ding et al 2021;Farid and Majumdar 2021;.…”

Section: Hierarchical Pac-bayesian Analysismentioning

confidence: 99%

“…The hierarchical analysis decomposes the entire problem into three tiers: task, subject, and curriculum, which allows us to construct the overall curriculum bound by combining the bounds from lower tiers. In addition, our theoretical contribution also makes two novel extensions to existing PAC-Bayes literature (Amit and Meir 2018; Rothfuss et al 2021;Ding et al 2021), including (i) deriving a bound on noisy meta-learning tasks and (ii) tackling the non i.i.d. task dependencies across different subjects.…”

Section: Introductionmentioning

confidence: 97%

See 1 more Smart Citation

Dual-Level Curriculum Meta-Learning for Noisy Few-Shot Learning Tasks

Que,

2024

AAAI

View full text Add to dashboard Cite

Few-shot learning (FSL) is essential in many practical applications. However, the limited training examples make the models more vulnerable to label noise, which can lead to poor generalization capability. To address this critical challenge, we propose a curriculum meta-learning model that employs a novel dual-level class-example sampling strategy to create a robust curriculum for adaptive task distribution formulation and robust model training. The dual-level framework proposes a heuristic class sampling criterion that measures pairwise class boundary complexity to form a class curriculum; it uses effective example sampling through an under-trained proxy model to form an example curriculum. By utilizing both class-level and example-level information, our approach is more robust to handle limited training data and noisy labels that commonly occur in few-shot learning tasks. The model has efficient convergence behavior, which is verified through rigorous convergence analysis. Additionally, we establish a novel error bound through a hierarchical PAC-Bayesian analysis for curriculum meta-learning under noise. We conduct extensive experiments that demonstrate the effectiveness of our framework in outperforming existing noisy few-shot learning methods under various few-shot classification benchmarks. Our code is available at https://github.com/ritmininglab/DCML.

show abstract

“…In the context of meta-learning, PAC-Bayesian theory is extensively studied to provide guarantees for generalization errors (Ding et al 2021;Farid and Majumdar 2021;.…”

Section: Hierarchical Pac-bayesian Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Dual-Level Curriculum Meta-Learning for Noisy Few-Shot Learning Tasks

Que,

2024

AAAI

View full text Add to dashboard Cite

show abstract

“…To avoid exhaustive attempts on all pairs of source tasks and target tasks, TE provides efficient heuristics to exhibit the best-performing source task at a minor cost . Originated in the field of CV, a great number of TE approaches, including model-similarity-based methods (Dwivedi and Roig, 2019), label-comparison-based methods (Tran et al, 2019) and source features-based methods (Ding et al, 2022), etc., have been proposed in the past few years. To adapt such techniques to PLM selection for NLP tasks, Bassignana et al (2022) found the predictions of LogME can positively correlate with the true performances of candidate PLMs, and Vu et al (2022) exhibited the model similarity computed by soft prompts reflects the transfer performance across different models.…”

Section: Related Workmentioning

confidence: 99%

“…Model Similarity-based Methods DSE (Vu et al, 2020) ϕ(x), ψ(x) ✓ ✗ ✗ DDS (Dwivedi et al, 2020) ϕ(x), ψ(x) ✓ ✗ ✗ Training-free Methods MSC (Meiseles and Rokach, 2020) ϕ(x), y ✗ ✗ ✓ kNN (Puigcerver et al, 2021) ϕ(x), y ✗ ✗ ✓ PARC (Bolya et al, 2021) ϕ(x), y ✗ ✗ ✓ GBC ϕ(x), y ✗ ✗ ✓ Logistic (Kumari et al, 2022) ϕ(x), y ✗ ✓ ✓ H-score (Bao et al, 2019) ϕ(x), y ✗ ✗ ✓ Reg. H-score (Ibrahim et al, 2022) ϕ(x), y ✗ ✗ ✓ N LEEP ϕ(x), y ✗ ✗ ✓ TransRate (Huang et al, 2022) ϕ(x), y ✗ ✗ ✓ LogME (You et al, 2021) ϕ(x), y ✗ ✓ ✓ SFDA (Shao et al, 2022) ϕ(x), y ✗ ✓ ✓ PACTran (Ding et al, 2022) ϕ(x), y ✗ ✓ ✓ target classes by LR's test accuracy.…”

Section: Free Of Trainingmentioning

confidence: 99%

“…1, according to the need for training on target task, we divide TE methods into: (1) Model Similarity-based Methods that assume the inter-model similarity reflects the transferability which require the model trained on target task (Dwivedi and Roig, 2019). ( 2) Training-free Methods that accelerate the estimation process by computing metrics free of target model training to examine the compatibility of the PLM's feature space on the target dataset (Ding et al, 2022). Then we conduct qualitative analysis for the applicability and provide empirical results on the GLUE benchmark (Wang et al, 2019) to manifest specific strengths and weaknesses in existing methods.…”

mentioning

confidence: 99%

See 1 more Smart Citation