2021
DOI: 10.48550/arxiv.2111.06053
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving Large-scale Language Models and Resources for Filipino

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…For this study, since we are dealing with responses from the participants in transcribed in Tagalog, we use the robust version Tagalog BERT model called RoBERTa as the main language model of choice [19]. We set the parameters of KeyBERT to generate 10 potential keyword groups with each containing 5 candidate keywords.…”
Section: Language-model Assisted Keyword Extractionmentioning
confidence: 99%
“…For this study, since we are dealing with responses from the participants in transcribed in Tagalog, we use the robust version Tagalog BERT model called RoBERTa as the main language model of choice [19]. We set the parameters of KeyBERT to generate 10 potential keyword groups with each containing 5 candidate keywords.…”
Section: Language-model Assisted Keyword Extractionmentioning
confidence: 99%
“…To achieve the best-performing model, an experimental setup involving the three (3) transformer encoder models was prepared. Specifically, the BERT Tagalog Base (BERT-Base) (Cruz and Cheng, 2019), RoBERTa Tagalog Base (RoBERTa-Base) (Cruz and Cheng, 2021), and RoBERTa Tagalog Large (RoBERTa-Large) (Cruz and Cheng, 2021) models were fine-tuned and tested using the dataset discussed in Subsection 3.6. Furthermore, all of the models were fine-tuned and tested on an NVIDIA RTX A6000 GPU using the GECToR model's default fine-tuning and predicting hyperparameters.…”
Section: Experiments Setupmentioning
confidence: 99%
“…This poses a problem for lowresource languages such as Filipino. Workarounds were created to address this such as synthetic dataset creation (Grundkiewicz et al, 2019) and large-scale corpus creation (Cruz and Cheng, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…The following heuristic-based filters based on Cruz and Cheng (2021) are used before applying the others:…”
Section: Heuristic-basedmentioning
confidence: 99%