Comparison of BERT Models and Machine Learning Methods for Sentiment Analysis on Turkish Tweets

Güven, Zekeriya Anıl

doi:10.1109/ubmk52708.2021.9559014

Cited by 16 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Choosing the appropriate supervised learning algorithm is essential for achieving accurate content classification and categorization. The task involves evaluating various algorithms such as support vector machines (SVM), random forests, or deep learning models like convolutional neural networks (CNNs) or transformerbased architectures (e.g., BERT, GPT) [4][5][6]. Additionally, model optimization techniques such as hyperparameter tuning, cross-validation, and regularization methods need to be applied to enhance the model's performance [16,17].…”

Section: Statement For the Taskmentioning

confidence: 99%

“…Transformer-based architectures such as BERT (Bidirectional Encoder Representations from Transformers) have gained significant attention in natural language processing tasks due to their ability to capture contextual information effectively [4][5][6]. In the context of content management, these architectures can be leveraged to enhance the accuracy and efficiency of content classification and categorization.…”

Section: Main Partmentioning

confidence: 99%

“…By utilizing unsupervised learning techniques such as clustering or size reduction, content management systems can automatically group similar documents, images, or videos together based on their inherent similarities. This can be particularly useful when dealing with large volumes of unstructured data where manually labeling each piece of content is impractical or infeasible [4][5][6].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hybrid Approach for Data Filtering and Machine Learning Inside Content Management System

Poliarush,

Krepych,

Spivak

2023

A.I.S.

View full text Add to dashboard Cite

The object of research is the processes of data filtering and machine learning in content management systems. The subject of research is developing a hybrid approach to data filtering based on a combination of supervised and unsupervised machine learning. The article explores machine learning approaches to content management and how they can change the way we organize, categorize, and derive value from vast amounts of data. The main goal is to develop and use a hybrid approach for data filtering and training that will help optimize resource consumption and perform supervised training for better categorization in the future. This approach includes elements of supervised and unsupervised learning using the BERT architecture that uses this kind of flow that help reduce resource usage and adjust the algorithm to perform better in a specific area. As a result, thanks to this approach, the intelligent system was able to independently optimize for a specific field of use and help to reduce the costs of using resources. Conclusion. After applying a hybrid approach of data filtering and machine learning to existing data streams, we obtain a performance increase of up to 5%, and this percentage increases depending on the running time of the application.

show abstract

Section: Statement For the Taskmentioning

confidence: 99%

Section: Main Partmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hybrid Approach for Data Filtering and Machine Learning Inside Content Management System

Poliarush,

Krepych,

Spivak

2023

A.I.S.

View full text Add to dashboard Cite

show abstract

“…The most important achievement of this model is that it is pre-trained on 104 different multilingual corpora and it performs quite well even in low-resource languages. In addition, the M-BERT model performs training taking into account the structures of all languages [37]. In this study, a pre-trained M-BERT model which supports 104 languages including Turkish with 12 stacked Transformer blocks, hidden dimensions 768, 12 self-attention heads, and overall 110,000,000 parameters was used.…”

Section: M-bert Modelmentioning

confidence: 99%

Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media

Karayiğit

Akdağlı

Acı

2022

ITC

View full text Add to dashboard Cite

Homophobic expressions are a form of insulting the sexual orientation or personality of people. Severe psychological traumas may occur in people who are exposed to this type of communication. It is important to develop automatic classification systems based on language models to examine social media content and distinguish homophobic discourse. This study aims to present a pre-trained Multilingual Bidirectional Encoder Representations from Transformers (M-BERT) model that can successfully detect whether Turkish comments on social media contain homophobic or related hate comments (i.e., sexist, severe humiliation, and defecation expressions). Comments in the Homophobic-Abusive Turkish Comments (HATC) dataset were collected from Instagram to train the detection models. The HATC dataset was manually labeled at the sentence level and combined with the Abusive Turkish Comments (ATC) dataset that has developed in our previous study. The HATC dataset has been balanced using the resampling method and two forms of the dataset (i.e., resHATC and original HATC) were used in the experiments. Afterward, the M-BERT model was compared with DL-based models (i.e., Long-Short Term Memory, Bidirectional Long-Short Term Memory (BiLSTM), Gated Recurrent Unit), Traditional Machine Learning (TML) classifiers (i.e., Support Vector Machine, Naive Bayes, Random Forest) and Ensemble Classifiers (i.e., Adaptive Boosting, eXtreme Gradient Boosting, Gradient Boosting) for the best model selection. The performance of the detection models was evaluated using F1-score, precision, and recall performance metrics. Results showed the best performance (homophobic F1-score: 82.64%, hateful F1-score: 91.75%, neutral F1-score: 96.08%, average F1-score: 90.15%) was achieved with the M-BERT model on the HATC dataset. The M-BERT detection model can increase the effectiveness of filters in detecting Turkish homophobic and related hate speech in social networks. It can be used to detect homophobic and related hate speech for different languages since the M-BERT model has multilingual pre-trained data.

show abstract

“…Comparative investigations demonstrate that the bidirectional LSTM-connected conditional random field (CRF) model outperforms the LSTMconnected conditional random field (CRF) model. Existing event extraction methods [16,17], usually for news and other corpora, mainly rely on trigger words to detect certain events and then extract relevant event parameters, which are not suitable for unstructured personnel resume texts [18]. Author [19] proposed that event types can be detected through the critical parameters in the event, without relying on trigger words to see possibilities and extract event parameters.…”

Section: Introductionmentioning

confidence: 99%

Identification of End-User Economical Relationship Graph Using Lightweight Blockchain-Based BERT Model

Jagdish¹,

Shah²,

Agarwal

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Current methods for extracting information from user resumes do not work well with unstructured user resumes in economic announcements, and they do not work well with documents that have the same users in them. Unstructured user information is turned into structured user information templates in this study. It also proposes a way to build person relationship graphs in the field of economics. First, the lightweight blockchain-based BERT model (B-BERT) is trained. The learned B-BERT pretraining model is then utilized to get the event instance vector, categorize it appropriately, and populate the hierarchical user information templates with accurate user characteristics. The aim of this research is that it has investigated the approach of creating character connection graphs in the Chinese financial system and suggests a framework for doing so in the economic sector. Furthermore, the relationship between users is found through the filled-in user information template, and a graph of user relationships is made. This is how it works: finally, the experiment is checked by filling in a manually annotated dataset. In tests, the method can be used to get text information from unstructured economic user resumes and build a relationship map of people in the financial field. The experimental results show that the proposed approach is capable of efficiently retrieving information from unstructured financial personnel resume text and generating a character relationship graph in the economic sphere.

show abstract

Comparison of BERT Models and Machine Learning Methods for Sentiment Analysis on Turkish Tweets

Cited by 16 publications

References 0 publications

Hybrid Approach for Data Filtering and Machine Learning Inside Content Management System

Hybrid Approach for Data Filtering and Machine Learning Inside Content Management System

Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media

Identification of End-User Economical Relationship Graph Using Lightweight Blockchain-Based BERT Model

Contact Info

Product

Resources

About