Figen Yildiz scite author profile

Figen Yildiz

4Publications

65Citation Statements Received

41Citation Statements Given

How they've been cited

105

How they cite others

Affiliations

Cukurova University, Zimmer Biomet (United States)

Publications

Order By: Most citations

Comparison of K-Means and Fuzzy C-Means Algorithms on Different Cluster Structures

Cebecí

Yildiz²

2015

JAI

View full text Add to dashboard Cite

In this paper the K-means (KM) and the Fuzzy C-means (FCM) algorithms were compared for their computing performance and clustering accuracy on different shaped cluster structures which are regularly and irregularly scattered in two dimensional space. While the accuracy of the KM with single pass was lower than those of the FCM, the KM with multiple starts showed nearly the same clustering accuracy with the FCM. Moreover the KM with multiple starts was extremely superior to the FCM in computing time in all datasets analyzed. Therefore, when well separated cluster structures spreading with regular patterns do exist in datasets the KM with multiple starts was recommended for cluster analysis because of its comparable accuracy and runtime performances.

show abstract

Validation of fuzzy and possibilistic clustering results

Cebecí

Kavlak

Yildiz

2017

View full text Add to dashboard Cite

Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset

Cebecí

Yildiz

2017

Turkish JAF Sci.Tech.

View full text Add to dashboard Cite

Discretization is a data pre-processing task transforming continuous variables into discrete ones in order to apply some data mining algorithms such as association rules extraction and classification trees. In this study we empirically compared the performances of equal width intervals (EWI), equal frequency intervals (EFI) and Kmeans clustering (KMC) methods to discretize 14 continuous variables in a chicken egg quality traits dataset. We revealed that these unsupervised discretization methods can decrease the training error rates and increase the test accuracies of the classification tree models. By comparing the training errors and test accuracies of the model applied with C5.0 classification tree algorithm we also found that EWI, EFI and KMC methods produced the more or less similar results. Among the rules used for estimating the number of intervals, the Rice rule gave the best result with EWI but not with EFI. It was also found that Freedman-Diaconis rule with EFI and Doane rule with EFI and EWI slightly performed better than the other rules. Ayrıklaştırma, sınıflama ağaçları ve birliktelik kuralları çıkarma gibi bazı veri madenciliği algoritmalarında sürekli değişkenleri kesikli değişkenlere dönüştüren bir veri önişleme adımıdır. Bu çalışmada eşit genişlikli aralıklar (EWI), eşit frekanslı aralıklar (EFI) ve K-ortalamalar kümelemesi (KMC) yöntemleri, bir tavuk yumurtası kalite özellikleri veri setinde 14 sürekli değişkenin ayrıklaştırmasındaki performansları bakımından deneysel olarak karşılaştırılmıştır. Bu yönetimsiz ayrıklaştırma yönteminin sınıflama ağacı modelleri için öğrenme hatalarını düşürdüğü ve doğruluğu yükselttiği belirlenmiştir. C5.0 sınıflama ağacı algoritması kullanılarak uygulanan modelin öğrenme hatası ve test doğruluğu kullanılarak yapılan karşılaştırmalara göre EWI, EFI ve KMC yöntemlerinin birbirine yakın sonuçlar verdikleri görülmüştür. Yöntemlerde aralık sayısını hesaplamak için kullanılan kurallar arasında, Rice kuralı EFI'de olmamakla birlikte EWI ile en iyi sonucu üretmiştir. Ayrıca EWI ile Freedman-Diaconis kuralının ve EFI ve EWI'nin her ikisinde ise Doane kuralının diğer kurallardan kısmen daha iyi oldukları saptanmıştır.

show abstract

Efficiency of Random Sampling Based Data Size Reduction on Computing Time and Validity of Clustering in Data Mining

Cebecí¹,

Yildiz²

2016

JAI

View full text Add to dashboard Cite

A B S T R A C TIn data mining, cluster analysis is one of the widely used analytics to discover existing groups in datasets. However, the traditional clustering algorithms become insufficient for the analysis of big data which have been formed with the enormous increase in the amount of collected data in recent years. Therefore, the scalability has been one of the most intensively studied research topics for clustering big data. The parallel clustering algorithms and the Map-Reduce framework based techniques on multiple machines are getting popular in scalability for big data analysis. However, applying the sampling techniques on big datasets could be still alternative or complementary task in order to run the traditional algorithms on single machines. The results obtained in this study showed that the data size reduction by the simple random sampling could be successfully used in cluster analysis for large datasets. The clustering validities by running K-means algorithm on the sample datasets were found as high as those of the complete datasets. Additionally the required execution time for cluster analysis on the sample datasets was significantly shorter than those obtained for the complete datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Figen Yildiz

Comparison of K-Means and Fuzzy C-Means Algorithms on Different Cluster Structures

Validation of fuzzy and possibilistic clustering results

Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset

Efficiency of Random Sampling Based Data Size Reduction on Computing Time and Validity of Clustering in Data Mining

Contact Info

Product

Resources

About