Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)

Guillaume, Séverine; Wisniewski, Guillaume; Macaire, Cécile; Jacques, Guillaume; Michaud, Alexis; Galliot, Benjamin; Coavoux, Maximin; Rossato, Solange; Nguyên, Minh-Châu; Fily, Maxime

doi:10.18653/v1/2022.computel-1.21

Cited by 5 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For speech processing, fine-tuning can work not only for language model adaptation [51], [54], but also for tuning acoustic models [52], [53], [55], [56]. Fine-tuning language models in speech processing is same as its use in NLP.…”

Section: Fine-tuning In Speech Processingmentioning

confidence: 99%

“…Fine-tuning language models in speech processing is same as its use in NLP. Guillaume et al [54] developed a method using a transformer architecture to tune a generic pre-trained representation model for phonemic recognition. For acoustic model adaptation, Violeta et al [52] proposed an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pre-training and target data.…”

Section: Fine-tuning In Speech Processingmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing Transfer Learning Reliability via Block-Wise Fine-Tuning

Barakat,

Huang

2023

2023 International Conference on Machine Learning and Applications (ICMLA)

View full text Add to dashboard Cite

Fine-tuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-trained models. However, previous studies on fine-tuning focused on adapting only the weights of a task-specific classifier or reoptimising all layers of the pre-trained model using the new task data. The first type of method cannot mitigate the mismatch between a pre-trained model and the new task data, and the second type of method easily causes over-fitting when processing tasks with limited data. To explore the effectiveness of finetuning, we propose a novel block-wise optimisation mechanism, which adapts the weights of a group of layers of a pre-trained model. This work presents a theoretical framework and empirical evaluation of block-wise fine-tuning to find a reliable transfer learning strategy. The proposed approach is evaluated on two datasets, Oxford Flowers and Caltech 101, using 15 commonly used state-of-the-art pre-trained base models.Results indicate that the proposed strategy consistently outperforms the baselines in terms of classification accuracy, although the specific block leading to optimal performance may vary across models. The investigation reveals that selecting a block from the fourth quarter of a base model generally yields improved performance compared to the baselines. Overall, the block-wise approach consistently outperforms the baselines and exhibits higher accuracy and reliability. This study provides valuable insights into the selection of salient blocks and highlights the effectiveness of block-wise fine-tuning in achieving improved classification accuracy in various models and datasets.

show abstract

Section: Fine-tuning In Speech Processingmentioning

confidence: 99%

Section: Fine-tuning In Speech Processingmentioning

confidence: 99%

Enhancing Transfer Learning Reliability via Block-Wise Fine-Tuning

Barakat,

Huang

2023

2023 International Conference on Machine Learning and Applications (ICMLA)

View full text Add to dashboard Cite

show abstract

“…Automatic Speech Recognition of "minority", "underresourced" languages is not only extremely important for the field of language documentation [7,8,9]: it also raises various scientific challenges [10,11]. Specifically, this area constitutes a particularly interesting test bed for evaluating and analyzing the properties of unsupervised language representations uncovered by neural networks such as wav2vec.…”

Section: Language Documentation: a Task That Presents Major Challenge...mentioning

confidence: 99%

Plugging a neural phoneme recognizer into a simple language model: a workflow for low-resource setting

Guillaume¹,

Wisniewski²,

Galliot³

et al. 2022

Interspeech 2022

View full text Add to dashboard Cite

Recently, several works have shown that fine-tuning a multilingual model of speech representation (typically XLS-R) with very small amounts of annotated data allows for the development of phonemic transcription systems of sufficient quality to help field linguists in their efforts to document the languages of the world. In this work, we explain how the quality of these systems can be improved by a very simple method, namely integrating them with a language model. Our experiments on an endangered language, Japhug (Trans-Himalayan/Tibeto-Burman), show that this approach can significantly reduce the WER, reaching the stage of automatic recognition of entire words.

show abstract

“…Transfer learning through pretrain-finetune is also prevalent in medical imaging (He et al, 2023) for tasks like disease diagnosis and organ segmentation. Additionally, it is widely employed in recommendation systems (Zhang et al, 2023b), speech recognition (Guillaume et al, 2022), and reinforcement learning (Luo et al, 2023).…”

Section: Applications Of Pretrain-finetune Frameworkmentioning

confidence: 99%

YNU-HPCC at SemEval-2022 Task 4: Finetuning Pretrained Language Models for Patronizing and Condescending Language Detection

Wang¹,

Zhang²

2022

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

View full text Add to dashboard Cite

This paper describes a system built for the SemEval-2022 competition. As participants in Task 4: Patronizing and Condescending Language Detection, we implemented the text sentiment classification system for two subtasks in English. Both subtasks involve determining emotions; subtask 1 requires us to determine whether the text belongs to the PCL category (single-label classification), and subtask 2 requires us to determine to which PCL category the text belongs (multi-label classification). Our system is based on the bidirectional encoder representations from transformers (BERT) model. For the single-label classification, our system applies a BertForSequence-Classification model to classify the input text. For the multi-label classification, we use the fine-tuned BERT model to extract the sentiment score of the text and a fully connected layer to classify the text into the PCL categories. Our system achieved relatively good results on the competition's official leaderboard.

show abstract

Fine-tuning pre-trained models for Automatic Speech Recognition, experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)

Cited by 5 publications

References 9 publications

Enhancing Transfer Learning Reliability via Block-Wise Fine-Tuning

Enhancing Transfer Learning Reliability via Block-Wise Fine-Tuning

Plugging a neural phoneme recognizer into a simple language model: a workflow for low-resource setting

YNU-HPCC at SemEval-2022 Task 4: Finetuning Pretrained Language Models for Patronizing and Condescending Language Detection

Contact Info

Product

Resources

About