Low-Resource Text Classification via Cross-Lingual Language Model Fine-Tuning

Li, Xiuhong; Li, Zhe; Sheng, Jiabao; Slamu, Wushour

doi:10.1007/978-3-030-63031-7_17

“…In recent years, numerous researchers have drawn their attention to contrastive Learning [20][21][22][23][24][25], owing to its extraordinary performance in sentiment analysis [26][27][28]. Many models, underpinned by contrastive learning, have been introduced in natural language processing and computer vision.…”

Section: Contrastive Learningmentioning

confidence: 99%

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

Guo,

Liao,

Li

et al. 2023

Entropy

View full text Add to dashboard Cite

The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features’ means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches.

show abstract

“…We performed sequence tagging on different transformer models: (a) We use the uncased base implementation of BERT and mBERT (Devlin et al, 2018) (b) Distill mBERT (Sanh et al, 2019), (c) XLM-RoBERTa (Conneau et al, 2019) trained using knowledge distillation and (d) Char-BERT (Boukkouri et al, 2020) that employs Character CNN to capture unknown and misspelled words. Motivated by prior works on multi-task learning (Chandu et al, 2018;Li et al, 2020), we also experiment with language-aware modeling. In these experiments, we added a language token either as the input encoding or output prediction.…”

Section: Datasets and Modelsmentioning

confidence: 99%

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Chopra¹,

Rallabandi²,

Black³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks -POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks. We plan to release our models and the code for all our experiments.

show abstract

“…We performed sequence tagging on different transformer models: (a) We use the uncased base implementation of BERT and mBERT (Devlin et al, 2018) (b) Distill mBERT (Sanh et al, 2019), (c) XLM-RoBERTa (Conneau et al, 2019) trained using knowledge distillation and (d) Char-BERT (Boukkouri et al, 2020) that employs Character CNN to capture unknown and misspelled words. Motivated by prior works on multi-task learning (Chandu et al, 2018;Li et al, 2020), we also experiment with language-aware modeling. In these experiments, we added a language token either as the input encoding or output prediction.…”

Section: Datasets and Modelsmentioning

confidence: 99%

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Chopra¹,

Rallabandi²,

Black³

et al. 2021

Preprint

0

View full text Add to dashboard Cite

Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks -POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks 1 .

show abstract

Low-Resource Text Classification via Cross-Lingual Language Model Fine-Tuning

Cited by 12 publications

References 14 publications

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Contact Info

Product

Resources

About