Comparison of Topic Modeling Methods for Type Detection of Turkish News

Güven, Zekeriya Anıl; Di̇ri̇, Banu; Çakaloğlu, Tolgahan

doi:10.1109/ubmk.2019.8907050

Cited by 9 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Güven et al 24 aimed to determine the category of news comparing the results utilizing four topic modeling algorithms: classical latent Dirichlet allocation (LDA), latent semantic analysis (LSA), non-negative matrix factorization (NMF) algorithms, and n-LDA. A dataset that consists of 4200 Turkish news titles and 7 class labels is obtained from the Turkish news websites.…”

Section: Turkish News Classificationmentioning

confidence: 99%

Improving automated Turkish text classification with learning‐based algorithms

Köksal

Yılmaz

2022

Concurrency and Computation

View full text Add to dashboard Cite

Text classification is the process of determining categories or tags of a document depending on its content. Although text classification is a well‐known process, it has many steps that require tuning to improve mathematical models. This article provides a novel methodology and expresses key points to improve text classification performance using learning‐based algorithms and techniques. First, to check the effectiveness of the proposed methodology, we selected two public Turkish news benchmarking datasets. Then, we performed extensive testing using both supervised machine learning algorithms and state‐of‐art pre‐trained language models. The experimental results show that our methodology outperforms previous news classification studies on these benchmarking datasets improving categorization results based on F1‐score. Therefore, we conclude that the presented methodology efficiently improves the classification results and selects the feasible classifier for a given dataset.

show abstract

Section: Turkish News Classificationmentioning

confidence: 99%

Improving automated Turkish text classification with learning‐based algorithms

Köksal

Yılmaz

2022

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…It has 7 different class labels: economy, politics, magazine, sports, health, technology, and events. There are a total of 4200 news texts, 600 for each news label [18]. 80% of this dataset is used for training and 20% for testing.…”

Section: Datasetmentioning

confidence: 99%

Performance Comparison of Large Language Models, GPT and Gemini on Turkish News Classification Task

Guven

2024

Preprint

View full text Add to dashboard Cite

Recently, large language models-LLMs have become very popular in many tasks of natural language processing. Examples of these tasks include text classification, question answering, text summarization, and text generation for natural language processing. Apart from LLMs, GPT and Gemini models are at the top of the list in terms of use for text generation tasks. This study aims to contribute to the literature on the use and comparison of LLMs and text generation models for the Turkish language. To achieve this purpose, the dataset consisting of Turkish news was classified by training BERT, ALBERT, DistilBERT, ELECTRA, XLM-RoBERTA LLMs with fine-tuning. Additionally, GPT-3.5 and Gemini text generation models were used by sending prompts for this classification task, and the success of the models was compared with LLMs. As a result of all analyses, the BERT model gave 97.619% accuracy among LLMs, while Gemini gave 99.167% accuracy among text generation models.

show abstract

“…Güven ve arkadaşları, 5 farklı duygu türü içeren (kızgın, korkmuş, mutlu, üzgün, şaşkın) 4000 adet tweetin GDA algoritmasıyla duygu sınıflandırılmasını gerçekleştirmiştir [8]. Bir başka çalışmalarında yine GDA ile 7 sınıfa ait 4200 adet Türkçe haber başlıklarından oluşan veri seti üzerinde ekonomi, spor ve yaşam gibi konular için konu modellemesi algoritmalarının başarı karşılaştırmasını yapmıştır [9]. Sınıf sayısının farklı tutulduğu farklı deneylerde NMF yöntemi 3 sınıf için iyi başarıyı gösterirken, 7 sınıf için en iyi başarıyı Gizli Anlamsal Analiz (GAA) yöntemi göstermiştir.…”

Section: Literatür Taramasıunclassified

Türkçe Metinlerde Otomatik Konu Tespiti

Aydın

Hallac

2021

Fırat Üniversitesi Mühendislik Bilimleri Dergisi

View full text Add to dashboard Cite

Bu çalışmada çevrimiçi kullanılabilecek bir konu tespit sistemi önerilmiştir. Gizli Dirichlet Ayırımı ile 4 farklı kategoriye ait toplam 400.000 haber dokümandan oluşan bir Türkçe derlem eğitilmiştir. Model, eğitim verisinde yer almayan, yeni gelen dokümanların konu tespitini yüksek başarı ile gerçekleştirebilmektedir. Konu modellerinin başarı değerlendirmesinde tutarlılık (coherence) değerine ek olarak sınıflandırma yöntemleri için geçerli olan kesinlik (precision), hassasiyet (recall), F-ölçümü gibi skorların elde edilmesine yönelik 2 farklı yaklaşım geliştirilmiştir. Bu yaklaşımlarda, konular ile dokümanların ait olduğu sınıfların eşleştirilmesinden yararlanılmıştır. İlk yaklaşımda, dokümanın ait olduğu sınıfa karşılık gelen konunun mevcut olup olmadığı üzerinden genel bir başarı ölçütü sunulmuştur. İkinci yaklaşımda ise modelin yüksek güven (confidence) ile gerçekleştirmediği tahminleri eleyen, "dokümanın en belirgin konusu, ait olduğu sınıftır" kabulüne göre bir eşik (threshold) değeri üzerinden değerlendirme yapılan bir ölçüt sunulmuştur. Önerilen başarı değerlendirme yöntemlerine göre sırasıyla %94.2 ve %90.9 doğrulukta konu tespiti başarısı elde edilmiştir.

show abstract

Comparison of Topic Modeling Methods for Type Detection of Turkish News

Cited by 9 publications

References 10 publications

Improving automated Turkish text classification with learning‐based algorithms

Improving automated Turkish text classification with learning‐based algorithms

Performance Comparison of Large Language Models, GPT and Gemini on Turkish News Classification Task

Türkçe Metinlerde Otomatik Konu Tespiti

Contact Info

Product

Resources

About