The Effect of BERT, ELECTRA and ALBERT Language Models on Sentiment Analysis for Turkish Product Reviews

Güven, Zekeriya Anıl

doi:10.1109/ubmk52708.2021.9559007

Cited by 7 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…DM, kelime tahmini, sınıflandırma gibi görevler için metinlerin gövdelerini analiz etmektedir. Metinlerin gövdelerine ait kelime dizilerini girdi olarak kullanarak görev için olasılık dağılımı hesaplamaktadır (Guven, 2021b). DM, tek yönlü ve çift yönlü olarak iki modele ayrılmaktadır.…”

Section: Dil Modelleriunclassified

Türkçe E-postalarda Spam Tespiti için Makine Öğrenme Yöntemlerinin ve Dil Modellerinin Analizi

Güven

2023

EJOSAT

View full text Add to dashboard Cite

Son zamanlarda teknolojinin ve sosyal ağların gelişmesiyle çevrimiçi karşılıklı etkileşim, herhangi konuda fikirlerini paylaşma oldukça önem kazanmıştır. Bu etkileşimlerin olumlu yanı olsa da olumsuz yanı da oldukça fazladır. Sosyal ağlarda kullanıcıların bilgilerini elde edip kullanıcıları taklit etmek güvenlik açısından büyük bir problemdir. Böylelikle kullanıcılar üzerinden dolandırıcılık vs. yapılabilmektedir. Kullanıcıları taklit edebilmek için en yaygın yol spam mesajların, e-postaların, vs. atılmasıdır. Güvenlik probleminin üstesinden gelmek için spam filtreleme, spam tespiti yöntemi geliştirme gibi işlemler uygulanmaktadır. Bu çalışmada Türkçe e-postalarda spam içeren e-postaların tespiti için Rastgele Orman, Lojistik Regresyon, Naive Bayes, Yapay Sinir Ağları makine öğrenme yöntemleri ve BERT, ELECTRA, ALBERT, DistilBERT dil modelleri analiz edilmiştir. Böylece dil modellerinin Türkçe için spam e-postaları sınıflandırmadaki etkisi gösterilmek istenmiştir. Deneysel çalışmaların sonucunda, spam e-postaları sınıflandırmada tüm dil modelleri makine öğrenme yöntemlerine göre daha başarılı olmuştur. Makine öğrenme yöntemlerinden yapay sinir ağları %90.15 doğrulu değeri elde ederken, en başarılı dil modelleri %94.08 doğruluk değeri ile BERT ve ELECTRA olmuştur.

show abstract

Section: Dil Modelleriunclassified

Türkçe E-postalarda Spam Tespiti için Makine Öğrenme Yöntemlerinin ve Dil Modellerinin Analizi

Güven

2023

EJOSAT

View full text Add to dashboard Cite

show abstract

“…Their results show that it is possible to achieve few-shot performance similar to GPT-3 with much smaller language models. Due to the instability of manually designed prompts, many subsequent studies explore automatically searching the prompts, either in a discrete space (Gao, Fisch, and Chen 2021;Jiang et al 2020;Haviv, Berant, and Globerson 2021;Shin et al 2020;Ben-David, Oved, and Reichart 2021) or in a continuous space (Qin and Eisner 2021;Hambardzumyan, Khachatrian, and May 2021;Han et al 2021;Liu et al 2021b). The discrete prompt is usually designed as natural language phrases with blank to be filled while the continuous prompt is a sequence of vectors that can be updated arbitrarily during learning.…”

Section: Prompt-based Few-shot Learningmentioning

confidence: 99%

“…Similar to the structure of GAN (Goodfellow et al 2014), it pre-trains a small generator to replace some tokens in an input with their plausible alternatives and then a large discriminator to distinguish whether each word has been replaced by the generator or not. The unique effectiveness of pre-trained token-replaced detection model intrigues many studies to apply it in many NLP tasks, such as fact verification (Naseer, Asvial, and Sari 2021), question answering (Alrowili and Shanker 2021;Yamada, Asai, and Hajishirzi 2021), grammatical error detection (Yuan et al 2021), emotional classification (Zhang, Yu, and Zhu 2021;Guven 2021), and medication mention detection (Lee et al 2020). There are also some other studies that upgrade or extend the token-replaced detection pre-training mechanism.…”

Section: Token-replaced Detectionmentioning

confidence: 99%

Pre-trained Token-replaced Detection Model as Few-shot Learner

Li¹,

Li²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

Pre-trained masked language models have demonstrated remarkable ability as few-shot learners. In this paper, as an alternative, we propose a novel approach to few-shot learning with pretrained token-replaced detection models like ELECTRA. In this approach, we reformulate a classification or a regression task as a token-replaced detection problem. Specifically, we first define a template and label description words for each task and put them into the input to form a natural language prompt. Then, we employ the pre-trained token-replaced detection model to predict which label description word is the most original (i.e., least replaced) among all label description words in the prompt. A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models in both onesentence and two-sentence learning tasks.

show abstract

Sentiment Analysis of Twitter Data by Natural Language Processing and Machine Learning

Chaurasia¹,

Sherekar

2023

Studies in Autonomic, Data-Driven and Industrial Computing

View full text Add to dashboard Cite

The Effect of BERT, ELECTRA and ALBERT Language Models on Sentiment Analysis for Turkish Product Reviews

Cited by 7 publications

References 0 publications

Türkçe E-postalarda Spam Tespiti için Makine Öğrenme Yöntemlerinin ve Dil Modellerinin Analizi

Türkçe E-postalarda Spam Tespiti için Makine Öğrenme Yöntemlerinin ve Dil Modellerinin Analizi

Pre-trained Token-replaced Detection Model as Few-shot Learner

Sentiment Analysis of Twitter Data by Natural Language Processing and Machine Learning

Contact Info

Product

Resources

About