Abstract:To determine the number of clusters in the clustering analysis that has a broad range of applied sciences, such as physics, chemistry, biology, engineering, economics etc., many methods have been proposed in the literature. The aim of this paper is to determine the number of clusters of a dataset in a model-based clustering by using an Analytic Hierarchy Process (AHP). In this study, the AHP model has been created by using the information criteria Akaike's Information Criterion (AIC), Approximate Weight of Evidence (AWE), Bayesian Information Criterion (BIC), Classification Likelihood Criterion (CLC), and Kullback Information Criterion (KIC). The achievement of the proposed approach has been tested on common real and synthetic datasets. The proposed approach based on the corresponding information criteria has produced accurate results. The currently produced results have been seen to be more accurate than those corresponding to the information criteria.
Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropriate covariance structure in the mixture cluster analysis. In this study, the efficiency of information criteria that are commonly used in model selection is examined. The effectiveness of information criteria has been determined according to the success in the selection of the number of components and in the selection of an appropriate covariance matrix.
Statistical methods are useful for characterizing seismic hazard because earthquakes are, for all practical purposes, random phenomena. They provide additional insights to the seismic hazard or risk problem. Seismic risk and earthquake occurrence probabilities can be estimated by using probability distributions. In this study Weibull, Log-normal, Log-logistic, Exponential and Gamma distributions have been examined for which one has the best fit for the given data. Kolmogorov-Smirnov test statistics was used at the research of the distribution best represents earthquake data. At the end of the test, it has been detected that Weibull distribution is more appropriate than other distributions.
Risk analyses made in the area of seismic activity are going to be of great importance in determining the earthquake interoccurence times. Several statistical methods have been developed for this purpose. Recently, Exponential, Gamma and Weibull distributions are the frequently used methods in this regard. In this study, we investigate the interoccurence time statistics of earthquakes which occurred in the area coordinated 39º–42º N latitude and 30º–40º E longitude in the North Anatolian Fault Zone (NAFZ) between the years 1960–2008, with a mixture of two different distributions of Exponential, Gamma and Weibull and a mixture of the same kind of distribution. We found that the mixture distributions are more suitable than the other examined distribution models for small magnitudes (mc ≥ 3). Also Weibull-Gamma and Weibull-Exponential distributions are agreeable for large magnitudes (mc ≥ 5).
Özellik seçimi, veri analizinde veri hazırlamak için uygulanan ön işlemlerden biridir. Özellik seçimi basitçe orijinal özellik kümesinden en uygun özelliklerin alt kümesinin seçim işlemidir. Bu yöntemler, orijinal veri setinde alakasız ve gereksiz bilgiyi belirlemeye ve kaldırmaya çalışır. Bu çalışmada sınıf bilgisi kullanılarak değişim katsayısına dayalı yeni bir özellik seçim yöntemi önerilmiştir. Önerilen özellik seçim yönteminin etkinliği, gerçek veri setleri kullanılarak diğer iyi bilinen özellik seçim yöntemleri ile karşılaştırılarak değerlendirilmiştir. Özellik seçim yöntemlerinin performansı, karesel diskriminant analizinde sınıflama doğruluğu ve entropi kriterleri bakımından incelenmiştir. Çalışmada birim sayısının özellik sayısından fazla olduğu nicel verilerden oluşan üç gerçek veri seti kullanılmıştır. Her bir özellik seçim yöntemine göre önem sırası belirlenen özelliklerinden ilk d adet özellik kullanılarak karesel diskriminant analizi gerçekleştirilmiştir. Özellik sayısına göre özellik seçim yöntemlerinin karesel diskriminant analizindeki sınıflama doğruluğu ve entropi değerleri hesaplanmıştır. Çalışma sonuçları, önerilen özellik seçim yönteminin hesaplama basitliği ve etkinlik açısından sınıflama analizleri için iyi bilinen diğer özellik seçim yöntemleri karşısında güçlü bir alternatif olduğunu ortaya koymuştur.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.