Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview
of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify
cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
In this paper, we propose a novel hybrid genetic algorithm (GA) that finds a globally optimal partition of a given data into a specified number of clusters. GA's used earlier in clustering employ either an expensive crossover operator to generate valid child chromosomes from parent chromosomes or a costly fitness function or both. To circumvent these expensive operations, we hybridize GA with a classical gradient descent algorithm used in clustering, viz. K-means algorithm. Hence, the name genetic K-means algorithm (GKA). We define K-means operator, one-step of K-means algorithm, and use it in GKA as a search operator instead of crossover. We also define a biased mutation operator specific to clustering called distance-based-mutation. Using finite Markov chain theory, we prove that the GKA converges to the global optimum. It is observed in the simulations that GKA converges to the best known optimum corresponding to the given data in concurrence with the convergence result. It is also observed that GKA searches faster than some of the other evolutionary algorithms used for clustering.
The decline in vulture populations due to diclofenac poisoning has become an issue of some concern in India. This paper conducts a cost benefit analysis of policy options to mitigate these damages. Vultures compete for food with feral dogs, a major source of rabies and bites. These human health impacts are found to be significant and may outweigh costs of moving to alternative veterinary drugs. A preliminary survey of the Parsi community finds no spiritual values, though further work needs to be done on this issue. Even with a number of key benefits not valued -notably tourism and existence values -the net benefits of policies driven by vulture protection are found to be positive.
-We present a fast iterative algorithm for identifying the Support Vectors of a given set of points. Our algorithm works by maintaining a candidate Support Vector set. It uses a greedy approach to pick points for inclusion in the candidate set. When the addition of a point to the candidate set is blocked because of other points already present in the set we use a backtracking approach to prune away such points. To speed up convergence we initialize our algorithm with the nearest pair of points from opposite classes. We then use an optimization based approach to increment or prune the candidate Support Vector set. The algorithm makes repeated passes over the data to satisfy the KKT constraints. The memory requirements of our algorithm scale as O(|S| 2 ) in the average case, where |S| is the size of the Support Vector set. We show that the algorithm is extremely competitive as compared to other conventional iterative algorithms like SMO and the NPA. We present results on a variety of real life datasets to validate our claims.
Technical and environmental efficiency of some coal-fired thermal power plants in India is estimated using a methodology that accounts for firm's efforts to increase the production of good output and reduce pollution with the given resources and technology. The methodology used is directional output distance function. Estimates of firm-specific shadow prices of pollutants (bad outputs), and elasticity of substitution between good and bad outputs are also obtained. The technical and environmental inefficiency of a representative firm is estimated as 0.10 implying that the thermal power generating industry in Andhra Pradesh state of India could increase production of electricity by 10 per cent while decreasing generation of pollution by 10 percent. This result shows that there are incentives or win-win opportunities for the firms to voluntarily comply with the environmental regulation. It is found that there is a significant variation in marginal cost of pollution abatement or shadow prices of bad outputs across the firms and an increasing marginal cost of pollution abatement with respect to pollution reduction by the firms. The variation in marginal cost of pollution abatement and compliance to regulation across firms could be reduced by having economic instruments like emission tax.JEL Classification: Q 25
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.