Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization

Güran, Aysun; Bayazit, Nilgun Guler; Gürbüz, Mustafa Zahid

doi:10.3906/elk-1201-15

Cited by 14 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each class consists of only 200 examples. [11] obtained at their best 95.8 % for six-class category detection where there is only 100 documents under each category. There exists another study whose configuration is roughly equal to that of our study where the number of classes is 6 and there exists 600 document for each category, [16].…”

Section: The Results Of Deep Learning and Final Remarksmentioning

confidence: 99%

A Comparison of Different Approaches to Document Representation in Turkish Language

Yıldırım

Yıldız

2018

SDÜ Fen Bil Enst Der

View full text Add to dashboard Cite

Recently, deep learning methods have demonstrated state-of-the-art performance in numerous complex Natural Language Processing (NLP) problems. Easy accessibility of high-performance computing resources and open-source libraries makes Artificial Intelligence (AI) approaches more applicable for researchers. This sudden growth of available techniques shaped and improved standards in the field of NLP. Thus, we find an opportunity to compare different approaches to document representation, owing to various open-source libraries and a large amount of research. We evaluate four different paradigms to represent documents: Traditional bag-of-words approaches, topic modeling, embedding based approach and deep learning. As the main contribution of this article, we aim at evaluating all these representation approaches with suitable machine learning algorithms for document categorization problem in the Turkish language. The supervised architecture uses a benchmark dataset specifically prepared for this language. Within the architecture, we evaluate the representation approaches with corresponding machine learning algorithms such as Support Vector Machine (SVM), multi-nominal Naive Bayes Algorithm (m-NB) and so forth. We conduct a variety of experiments and present successful results for the Turkish document categorization. We also observed that tradition approaches have still comparable results with Neural Network models in terms of document classification. Metin Temsil Yöntemlerine Yönelik Farklı Yaklaşımların KarşılaştırılmasıAnahtar Kelimeler Metin temsiliyeti, Derin ögrenme, Dogal dil işleme Özet: Son zamanlarda derin ögrenme mimarileri bir çok dogal dil işleme problemini başarılı birşekilde çözmüştür. Açık kaynak kodlu kütüphanelerin yaygınlıgı yapay zeka yaklaşımlarını daha uygulanabilir hale getirmiştir. Teknolojideki bu ani ivmelenme dogal dil işlemedeki standartları dönüştürdü ve geliştirdi. Bu çalışmada açık kaynak kodların ve alanla ilgili araştırmaların rahat erişebilirligi sayesinde metin temsiliyeti yaklaşımlarının önemli bir kısmını degerlendirme imkanı bulduk. Dört farklı paradigmayı metin temsiliyeti açısından degerlendirdik: Geleneksel kelime torbası yaklaşımı, konu modelleme, gömme temsiliyeti ve derin ögrenme. Çalışmanın ana katkısı olarak, Türkçe için metin sınıflandırma problemini tüm bu metin temsiliyetlerini ve ilgili makine ögrenme algoritmalarını kullanarak ele aldık. Oluşturulan denetimli ögrenme mimarisi özellikle Türkçe için hazırlanmış bir veri seti ile sınanmıştır. Her bir temsiliyet için onunla uyumlu çalışacak SVM, çok-katlı Naive Bayes (mNB) gibi makine ögrenmesi algoritmaları sınandı. Çeşitli deneyler sonucunda başarılı bir metin sınıflandırıcı mimarisinin Türkçe için nasıl kurulacagını bu makalede tartıştık ve başarılı modeller sunduk. Son olarak kelime torbası gibi geleneksel yöntemlerin hala başarılı oldugunu ve derin ögrenme temelli modellerin bazılarından daha iyi oldugunu gördük.

show abstract

Section: The Results Of Deep Learning and Final Remarksmentioning

confidence: 99%

A Comparison of Different Approaches to Document Representation in Turkish Language

Yıldırım

Yıldız

2018

SDÜ Fen Bil Enst Der

View full text Add to dashboard Cite

show abstract

“…If the indicated sentence involves title words, then this sentence considered as an important sentence for the summary text [12]. For each sentence, the involved title words are directly proportional to the summarization score of the sentence.…”

Section: Given Document D and Position S (M) Is The Final Sentence Pomentioning

confidence: 99%

“…Güran et al [11] used non-negative matrix factorization method as a feature reduction method and summarized 100 news documents. Güran et al [12] presented a summarization system that combines some structural and semantic features of sentences by using analytical hierarchical process (AHP) and artificial bee colony algorithm. Cığır et al [7] generated summaries by ranking sentences due to their scores calculated by combining the features such as term frequency, title similarity, key phrases, position of the sentence in the document, and centrality of the sentence.…”

Section: Related Workmentioning

confidence: 99%

“…Let S = {S (1) , S Here, w ij weights are determined by AHP model such in the [12]. To generate a summary, all sentences are ranked due to their scores calculated in (9), and a number of the top-score sentences are included in the summary.…”

Section: Given Document D and Position S (M) Is The Final Sentence Pomentioning

confidence: 99%

“…The AHP model is used to integrate the scores of structural and semantic features into an overall sentence score as in [12]. The structural and semantic features are linearly combined using the weights determined by AHP.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Turkish Wikipedia Text Summarization System for Mobile Devices

Hatipoglu¹,

Omurca²

2016

IJITCS

View full text Add to dashboard Cite

Abstract-Today Wikipedia provides a very large and reliable domain-independent encyclopedic repository. With this study a mobile system which summarizes Turkish Wikipedia text is presented. The presented system selects the sentences due to structural features of Turkish language and semantic features of the sentences. The performance evaluation is made based on judgments of human experts. The results are tested due to precision and recall values of a ranked sentence list and it is concluded that, the summarization results are promising.

show abstract

TR-SUM: An Automatic Text Summarization Tool for Turkish

Yüksel

Çebi²

2023

Engineering Cyber-Physical Systems and Critical Infrastructures

View full text Add to dashboard Cite

Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization

Cited by 14 publications

References 27 publications

A Comparison of Different Approaches to Document Representation in Turkish Language

A Comparison of Different Approaches to Document Representation in Turkish Language

A Turkish Wikipedia Text Summarization System for Mobile Devices

TR-SUM: An Automatic Text Summarization Tool for Turkish

Contact Info

Product

Resources

About