Abstract-This paper presents a clustering algorithm for partitioning a minimum spanning tree with a constraint on minimum group size. The problem is motivated by microaggregation, a disclosure limitation technique in which similar records are aggregated into groups containing a minimum of k records. Heuristic clustering methods are needed since the minimum information loss microaggregation problem is NP-hard. Our MST partitioning algorithm for microaggregation is sufficiently efficient to be practical for large data sets and yields results that are comparable to the best available heuristic methods for microaggregation. For data that contain pronounced clustering effects, our method results in significantly lower information loss. Our algorithm is general enough to accommodate different measures of information loss and can be used for other clustering applications that have a constraint on minimum group size.
Abstract-Microaggregation is a technique used by statistical agencies to limit disclosure of sensitive microdata. Noting that no polynomial algorithms are known to microaggregate optimally, Domingo-Ferrer and Mateo-Sanz have presented heuristic microaggregation methods. This paper is the first to present an efficient polynomial algorithm for optimal univariate microaggregation. Optimal partitions are shown to correspond to shortest paths in a network.
Summary
Trillions of dollars are traded daily on the foreign exchange (forex) market, making it the largest financial market in the world. Accurate forecasting of forex rates is a necessary element in any effective hedging or speculation strategy in the forex market. Time series models and shallow neural networks provide acceptable point estimates for future rates but are poor at predicting the direction of change and, hence, are not very useful for supporting profitable trading strategies. Machine learning classifiers trained on input features crafted based on domain knowledge produce marginally better results. The recent success of deep networks is partially attributable to their ability to learn abstract features from raw data. This motivates us to investigate the ability of deep convolution neural networks to predict the direction of change in forex rates. Exchange rates for the currency pairs EUR/USD, GBP/USD and JPY/USD are used in experiments. Results demonstrate that trained deep networks achieve satisfactory out‐of‐sample prediction accuracy.
The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.