A Closer Look at How Fine-tuning Changes BERT

Zhou, Yichu; Srikumar, Vivek

doi:10.48550/arxiv.2106.14282

Cited by 3 publications

(3 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This idea is echoed in recent machine learning literature, which has shown that it is possible to quickly adapt large pre-trained networks to a broad range of downstream tasks of interest via “fine-tuning” paradigms (Brown et al, 2020; Radford et al, 2019; Reid et al, 2022). As the name suggests, fine-tuning induces only small changes in the network representations (Zhou and Srikumar, 2021), suggesting that representations in the network can be quickly “re-associated” with new functionality for the downstream tasks.…”

Section: Discussionmentioning

confidence: 99%

BCI learning phenomena can be explained by gradient-based optimization

Humphreys

Daie

Svoboda

et al. 2022

Preprint

View full text Add to dashboard Cite

Brain-computer interface (BCI) experiments have shown that animals are able to adapt their recorded neural activity in order to receive reward. Recent studies have highlighted two phenomena. First, the speed at which a BCI task can be learned is dependent on how closely the required neural activity aligns with pre-existing activity patterns: learning "out-of-manifold" tasks is slower than "in-manifold" tasks. Second, learning happens by "re-association": the overall distribution of neural activity patterns does not change significantly during task learning. These phenomena have been presented as distinctive aspects of BCI learning. Here we show, using simulations and theoretical analysis, that both phenomena result from the simple assumption that behaviour and representations are improved via gradient-based algorithms. We invoke Occam's Razor to suggest that this straightforward explanation should be preferred when accounting for these experimental observations.

show abstract

Section: Discussionmentioning

confidence: 99%

BCI learning phenomena can be explained by gradient-based optimization

Humphreys

Daie

Svoboda

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Additional output layers are added to the model, each specifically tailored for NLP tasks. This phase is devoted to optimizing BERT's broad language interpretation for tasks, necessitating smaller, more focused datasets [56,57].…”

Section: Bidirectional Encoder Representations From Transformers (Bert)mentioning

confidence: 99%

Automating Fault Test Cases Generation and Execution for Automotive Safety Validation via NLP and HIL Simulation

Amyan,

Abboush,

Knieke

et al. 2024

Sensors

View full text Add to dashboard Cite

The complexity and the criticality of automotive electronic implanted systems are steadily advancing and that is especially the case for automotive software development. ISO 26262 describes requirements for the development process to confirm the safety of such complex systems. Among these requirements, fault injection is a reliable technique to assess the effectiveness of safety mechanisms and verify the correct implementation of the safety requirements. However, the method of injecting the fault in the system under test in many cases is still manual and depends on an expert, requiring a high level of knowledge of the system. In complex systems, it consumes time, is difficult to execute, and takes effort, because the testers limit the fault injection experiments and inject the minimum number of possible test cases. Fault injection enables testers to identify and address potential issues with a system under test before they become actual problems. In the automotive industry, failures can have serious hazards. In these systems, it is essential to ensure that the system can operate safely even in the presence of faults. We propose an approach using natural language processing (NLP) technologies to automatically derive the fault test cases from the functional safety requirements (FSRs) and execute them automatically by hardware-in-the-loop (HIL) in real time according to the black-box concept and the ISO 26262 standard. The approach demonstrates effectiveness in automatically identifying fault injection locations and conditions, simplifying the testing process, and providing a scalable solution for various safety-critical systems.

show abstract

“…Linear Probing. Most of the works on updating pre-trained models have been mainly studied for language tasks (Dodge et al, 2020;Zhao et al, 2021;Zhou & Srikumar, 2021). In general computer vision setting, transfer learning and updating methods gained much attention (Zhai et al, 2019;Kornblith et al, 2019;Ericsson et al, 2021; Data Augmentation in FSL.…”

Section: Related Workmentioning

confidence: 99%

Revisiting the Updates of a Pre-trained Model for Few-shot Learning

Kim¹,

Oh²,

Kim³

et al. 2022

Preprint

View full text Add to dashboard Cite

Most of the recent few-shot learning algorithms are based on transfer learning, where a model is pre-trained using a large amount of source data, and the pre-trained model is updated using a small amount of target data afterward. In transfer-based few-shot learning, sophisticated pre-training methods have been widely studied for universal and improved representation. However, there is little study on updating pre-trained models for few-shot learning. In this paper, we compare the two popular updating methods, fine-tuning (i.e., updating the entire network) and linear probing (i.e., updating only the linear classifier), considering the distribution shift between the source and target data. We find that fine-tuning is better than linear probing as the number of samples increases, regardless of distribution shift. Next, we investigate the effectiveness and ineffectiveness of data augmentation when pre-trained models are fine-tuned. Our fundamental analyses demonstrate that careful considerations of the details about updating pre-trained models are required for better fewshot performance.

show abstract

A Closer Look at How Fine-tuning Changes BERT

Cited by 3 publications

References 29 publications

BCI learning phenomena can be explained by gradient-based optimization

BCI learning phenomena can be explained by gradient-based optimization

Automating Fault Test Cases Generation and Execution for Automotive Safety Validation via NLP and HIL Simulation

Revisiting the Updates of a Pre-trained Model for Few-shot Learning

Contact Info

Product

Resources

About