Improving text mining in plant health domain with GAN and/or pre-trained language model

Jiang, Shufan; Cormier, Stéphane; Angarita, Rafael; Rousseaux, Francis

doi:10.3389/frai.2023.1072329

Cited by 5 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They implemented their model on several classification datasets, and they found that the performance of their semi-supervised model gets better every time increasing the size of labeled dataset. Moreover, Jiang et al [20] used CamemBERT, and ChouBERT in order to build GAN-BERT models. They also worked on examining varied losses over changing the number of labeled and unlabeled samples in the training French datasets in order to provide greater understanding into when and how to train GAN-BERT models for domain-specific document categorization.…”

Section: Gan-bertmentioning

confidence: 99%

Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

Mnassri,

Farahbakhsh,

Crespi

2024

Complex Networks &Amp; Their Applications XII

View full text Add to dashboard Cite

Online communication has overcome linguistic and cultural barriers, enabling global connection through social media platforms. However, linguistic variety introduced more challenges in tasks such as the detection of hate speech content. Although multiple NLP solutions were proposed using advanced machine learning techniques, data annotation scarcity is still a serious problem urging the need for employing semi-supervised approaches. This paper proposes an innovative solution-a multilingual Semi-Supervised model based on Generative Adversarial Networks (GAN) and mBERT models, namely SS-GAN-mBERT. We managed to detect hate speech in Indo-European languages (in English, German, and Hindi) using only 20% labeled data from the HASOC2019 dataset. Our approach excelled in multilingual, zero-shot cross-lingual, and monolingual paradigms, achieving, on average, a 9.23% F1 score boost and 5.75% accuracy increase over baseline mBERT model.

show abstract

Section: Gan-bertmentioning

confidence: 99%

Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

Mnassri,

Farahbakhsh,

Crespi

2024

Complex Networks &Amp; Their Applications XII

View full text Add to dashboard Cite

show abstract

“…The pre-trained language model has achieved great success in natural language processing (NLP). Inspired by this, a considerable amount of pre-trained models were proposed and applied for Software Engineering tasks, for example, services classification [16,17], code generation [18], code summarisation [19,20], code completion [21] and clone detection [15], achieving significant progress. In this paper, we adopt CodeT5 [15] as the base model.…”

Section: Pre-trained Language Modelmentioning

confidence: 99%

DeepOCL: A deep neural network for Object Constraint Language generation from unrestricted nature language

Yang

Liu

Bao

et al. 2023

CAAI Trans on Intel Tech

View full text Add to dashboard Cite

Object Constraint Language (OCL) is one kind of lightweight formal specification, which is widely used for software verification and validation in NASA and Object Management Group projects. Although OCL provides a simple expressive syntax, it is hard for the developers to write correctly due to lacking knowledge of the mathematical foundations of the first‐order logic, which is approximately half accurate at the first stage of development. A deep neural network named DeepOCL is proposed, which takes the unrestricted natural language as inputs and automatically outputs the best‐scored OCL candidates without requiring a domain conceptual model that is compulsively required in existing rule‐based generation approaches. To demonstrate the validity of our proposed approach, ablation experiments were conducted on a new sentence‐aligned dataset named OCLPairs. The experiments show that the proposed DeepOCL can achieve state of the art for OCL statement generation, scored 74.30 on BLEU, and greatly outperformed experienced developers by 35.19%. The proposed approach is the first deep learning approach to generate the OCL expression from the natural language. It can be further developed as a CASE tool for the software industry.

show abstract

Could KeyWord Masking Strategy Improve Language Model?

Borovikova

Ferré

Bossy

et al. 2023

Natural Language Processing and Information Systems

View full text Add to dashboard Cite

Improving text mining in plant health domain with GAN and/or pre-trained language model

Cited by 5 publications

References 20 publications

Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

DeepOCL: A deep neural network for Object Constraint Language generation from unrestricted nature language

Could KeyWord Masking Strategy Improve Language Model?

Contact Info

Product

Resources

About