2020
DOI: 10.1109/access.2020.3015854
|View full text |Cite
|
Sign up to set email alerts
|

AgglutiFiT: Efficient Low-Resource Agglutinative Language Model Fine-Tuning

Abstract: Text classification tends to be difficult when data are inadequate considering the amount of manually labeled text corpora. For low-resource agglutinative languages including Uyghur, Kazakh, and Kyrgyz (UKK languages), in which words are manufactured via stems concatenated with several suffixes and stems are used as the representation of text content, this feature allows infinite derivatives vocabulary that leads to high uncertainty of writing forms and huge redundant features. There are major challenges of lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 15 publications
(14 reference statements)
0
9
0
Order By: Relevance
“…Z. Li et al, [101] proposed an efficient strategy that finetunes a pre-trained language model to perform sentiment analysis and text classification called AgglutiFiT. They finetuned the model using the low-noise fine-tuning dataset created by morphological analysis and stem extraction.…”
Section: B Pre-trained (Transformers)mentioning
confidence: 99%
“…Z. Li et al, [101] proposed an efficient strategy that finetunes a pre-trained language model to perform sentiment analysis and text classification called AgglutiFiT. They finetuned the model using the low-noise fine-tuning dataset created by morphological analysis and stem extraction.…”
Section: B Pre-trained (Transformers)mentioning
confidence: 99%
“…[17][18][19] has carried out a lot of research on cross-domain network structures based on contrastive learning, and applying these methods in the field of aspect-level sentiment analysis is also a great breakthrough; In Ref. [20][21][22], cross-domain network structures are used to conduct a large number of studies on text classification tasks when there are few corpora. As one of the classification tasks, whether aspect-level sentiment analysis can draw lessons from these methods is also worth exploring for subsequent studies.…”
Section: Introductionmentioning
confidence: 99%
“…We try to explain this phenomenon. XLM-R may contain more general information in lower layers ( Li et al, 2020 ); While BERT can capture surface features in lower layers, syntactic features in middle layers and semantic features in higher layers ( Jawahar, Sagot & Seddah, 2019 ). Because the surface features of Chinese are not very obvious to be recognized, when mBERT learns the shallow features of Chinese at the lower level, its recognition ability on Chinese is lower than XLM-R’s.…”
Section: Introductionmentioning
confidence: 99%