2023
DOI: 10.3389/frai.2023.1072329
|View full text |Cite
|
Sign up to set email alerts
|

Improving text mining in plant health domain with GAN and/or pre-trained language model

Abstract: The Bidirectional Encoder Representations from Transformers (BERT) architecture offers a cutting-edge approach to Natural Language Processing. It involves two steps: 1) pre-training a language model to extract contextualized features and 2) fine-tuning for specific downstream tasks. Although pre-trained language models (PLMs) have been successful in various text-mining applications, challenges remain, particularly in areas with limited labeled data such as plant health hazard detection from individuals' observ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…They implemented their model on several classification datasets, and they found that the performance of their semi-supervised model gets better every time increasing the size of labeled dataset. Moreover, Jiang et al [20] used CamemBERT, and ChouBERT in order to build GAN-BERT models. They also worked on examining varied losses over changing the number of labeled and unlabeled samples in the training French datasets in order to provide greater understanding into when and how to train GAN-BERT models for domain-specific document categorization.…”
Section: Gan-bertmentioning
confidence: 99%
“…They implemented their model on several classification datasets, and they found that the performance of their semi-supervised model gets better every time increasing the size of labeled dataset. Moreover, Jiang et al [20] used CamemBERT, and ChouBERT in order to build GAN-BERT models. They also worked on examining varied losses over changing the number of labeled and unlabeled samples in the training French datasets in order to provide greater understanding into when and how to train GAN-BERT models for domain-specific document categorization.…”
Section: Gan-bertmentioning
confidence: 99%
“…The pre-trained language model has achieved great success in natural language processing (NLP). Inspired by this, a considerable amount of pre-trained models were proposed and applied for Software Engineering tasks, for example, services classification [16,17], code generation [18], code summarisation [19,20], code completion [21] and clone detection [15], achieving significant progress. In this paper, we adopt CodeT5 [15] as the base model.…”
Section: Pre-trained Language Modelmentioning
confidence: 99%