Abstract. In this paper, we address the problem of protecting the underlying attribute values when sharing data for clustering. The challenge is how to meet privacy requirements and guarantee valid clustering results as well. To achieve this dual goal, we propose a novel spatial data transformation method called Rotation-Based Transformation (RBT). The major features of our data transformation are: a) it is independent of any clustering algorithm, b) it has a sound mathematical foundation; c) it is efficient and accurate; and d) it does not rely on intractability hypotheses from algebra and does not require CPU-intensive operations. We show analytically that although the data are transformed to achieve privacy, we can also get accurate clustering results by the safeguard of the global distances between data points.
The sharing of association rules has been proven beneficial in business collaboration, but requires privacy safeguards. One may decide to disclose only part of the knowledge and conceal strategic patterns called sensitive rules. These sensitive rules must be protected before sharing since they are paramount for strategic decisions and need to remain private. Some companies prefer to share their data for collaboration, while others prefer to share only the patterns discovered from their data. The challenge here is how to protect the sensitive rules without putting at risk the effectiveness of data mining per se. To address this challenging problem, we propose a unified framework which combines techniques for efficiently hiding sensitive rules: a set of algorithms to protect sensitive knowledge in transactional databases; retrieval facilities to speed up the process of protecting sensitive knowledge; and a set of metrics to evaluate the effectiveness of the proposed algorithms in terms of information loss and to quantify how much private information has been disclosed. Our experiments demonstrate that our framework is effective and achieves significant improvement over the other approaches presented in the literature.
Abstract. The sharing of association rules is often beneficial in industry, but requires privacy safeguards. One may decide to disclose only part of the knowledge and conceal strategic patterns which we call restrictive rules. These restrictive rules must be protected before sharing since they are paramount for strategic decisions and need to remain private. To address this challenging problem, we propose a unified framework for protecting sensitive knowledge before sharing. This framework encompasses: (a) an algorithm that sanitizes restrictive rules, while blocking some inference channels. We validate our algorithm against real and synthetic datasets; (b) a set of metrics to evaluate attacks against sensitive knowledge and the impact of the sanitization. We also introduce a taxonomy of sanitizing algorithms and a taxonomy of attacks against sensitive knowledge.
Heat waves usually result in losses of animal production since they are exposed to thermal stress inducing an increase in mortality and consequent economical losses. Animal science and meteorological databases from the last years contain enough data in the poultry production business to allow the modeling of mortality losses due to heat wave incidence. This research analyzes a database of broiler production associated to climatic data, using data mining techniques such as attribute selection and data classification (decision tree) to model the impact of heat wave incidence on broiler mortality. The temperature and humidity index (THI) was used for screening environmental data. The data mining techniques allowed the development of three comprehensible models for estimating specifically high mortality during broiler production. Two models yielded a classification accuracy of 89.3% by using Principal Component Analysis (PCA) and Wrapper feature selection approaches. Both models obtained a class precision of 0.83 for classifying high mortality. When the feature selection was made by the domain experts, the model accuracy reached 85.7%, while the class precision of high mortality was 0.76. Meteorological data and the calculated THI from meteorological stations were helpful to select the range of harmful environmental conditions for broilers 29 and 42 days old. The data mining techniques were useful for building animal production models.
ResumoO objetivo deste trabalho foi identificar zonas pluviometricamente homogêneas no Estado da Bahia e analisar as condições climá-ticas de cada zona entre 1981 e 2010. Foi aplicada a técnica de mineração de dados, Clusterização (agrupamento de dados), por meio do uso do algoritmo k-means, para transformação das séries históricas de precipitação em cinco zonas pluviometricamente homogêneas, em resposta à orografia, maritimidade e sistemas meteorológicos atuantes na região. Foram utilizados dados de médias mensais de precipitação de 92 estações meteorológicas. Os resultados apontam que as zonas mais secas estão situadas na parte central, de norte a sul do estado, principalmente ao norte com os menores volumes anuais, em torno de 480 mm. A zona localizada ao norte do estado é contrastante com a faixa litorânea, em que são observados os maiores volumes anuais de precipitação (1.380 mm aproximadamente). A alta variabilidade pluviométrica ocorre em quase todas as zonas, principalmente em duas do semiárido com coeficientes de variação (CV) iguais a 42 e 28%. Diferencia-se dessa característica a zona pertencente à faixa litorânea, que apresenta regularidade de chuvas durante todo o ano e CV de 15%. As estações chuvosas e secas estão bem definidas. Os valores de precipitação da estação chuvosa representam em torno de 81% dos totais anuais, com destaque para as zonas situadas no centro-oeste e oeste do estado, com 95 e 96% dos totais anuais.Palavras-chave: mineração de dados, clusterização, variabilidade pluviométrica. Analysis of rainfall homogeneous areas in time series of precipitation in the State of Bahia, Brazil AbstractThe aim of this study was to identify rainfall homogeneous areas in the State of Bahia, Brazil and analyze the climatic conditions of each area for the period between 1981 and 2010. It was applied a data mining technique, clustering (grouping of data), by using the k-means algorithm for transforming time series of precipitation in five rainfall homogeneous areas, in response to topography, maritime dimension, and weather systems operating in the region of study. Data of average monthly rainfall of 92 meteorological stations were used. The results indicate that the driest areas are situated in the central part of the state, from north to south, mainly in the north with the lowest annual volumes, around 480 mm. The area located in the north of the state contrasts with that one located on the coast, where the largest volumes of annual rainfall of the study were observed (approximately 1.380 mm). The high rainfall variability occurs in almost all areas, especially in two of those of semiarid ones with Coefficients of Variation (CV) reaching 42 and 28%. This characteristic differs from the area belonging to the coastal area, which presents regular rainfall during all the year and a CV of 15%. The rainy and dry seasons are well defined. Precipitation values of the rainy season accounts for about 81% of the annual total, with emphasis on the zones located in the central-west and west of the state with 95...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.