A constant and controlled level of emission of carbon and other gases into the atmosphere is a pre-condition for preventing global warming and an essential issue for a sustainable world. Fires in the natural environment are phenomena that extensively increase the level of greenhouse emissions and disturb the normal functioning of natural ecosystems. Therefore, estimating the risk of fire outbreaks and fire prevention are the first steps in reducing the damage caused by fire. In this study, we build predictive models to estimate the risk of fire outbreaks in Slovenia, using data from a GIS, Remote Sensing imagery and the weather prediction model ALADIN.The study is carried out on three datasets, from three regions: one for the Kras region, one for the coastal region and one for continental Slovenia. On these datasets, we apply both classical statistical approaches and state-of-the-art data mining algorithms, such as ensembles of decision trees, in order to obtain predictive models of fire outbreaks.Responsible editor: Katharina Morik, Kanishka Bhaduri and Hillol Kargupta. This paper has its origins in a project report ) and a short conference paper (Stojanova et al. 2006) that introduced the problem of forest fire prediction in Slovenia, using GIS, RS and meteorological data. However, this paper significantly extends and upgrades the work presented there. In particular: We consider a wider set of data mining techniques, from single classifiers to ensembles; We present a comparison of the predictive performance in terms of several frequently used evaluation measures for classification; We present an example of the results obtained from the modeling task in the form of decision rules, explain and interpret their meaning; We generate geographical maps and compare them with other fire prediction models (e.g., FWI fire risk danger maps) provided by other services.
BackgroundOntologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers.ResultsThis article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function.ConclusionsOur newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
Spatial autocorrelation is the correlation among data values which is strictly due to the relative spatial proximity of the objects that the data refer to. Inappropriate treatment of data with spatial dependencies, where spatial autocorrelation is ignored, can obfuscate important insights. In this paper, we propose a data mining method that explicitly considers spatial autocorrelation in the values of the response (target) variable when learning predictive clustering models. The method is based on the concept of predictive clustering trees (PCTs), according to which hierarchies of clusters of similar data are identified and a predictive model is associated to each cluster. In particular, our approach is able to learn predictive models for both a continuous response (regression task) and a discrete response (classification task). We evaluate our approach on several real world problems of spatial regression and spatial classification. The consideration of the autocorrelation in the models improves predictions that are consistently clustered in space and that clusters try to preserve the spatial arrangement of the data, at the same time providing a multi-level insight into the spatial autocorrelation phenomenon. The evaluation of SCLUS in several ecological domains (e.g. predicting outcrossing rates within a conventional field due to the surrounding genetically modified fields, as well as predicting pollen dispersal rates from two lines of plants) confirms its capability of building spatial aware models which capture the spatial distribution of the target variable. In general, the maps obtained by using SCLUS do not require further post-smoothing of the results if we want to use them in practice.
Regression inference in network data is a challenging task in machine learning and data mining. Network data describe entities represented by nodes, which may be connected with (related to) each other by edges. Many network datasets are characterized by a form of autocorrelation where the values of the response variable at a given node depend on the values of the variables (predictor and response) at the nodes connected to the given node. This phenomenon is a direct violation of the assumption of independent (i.i.d.) observations: At the same time, it offers a unique opportunity to improve the performance of predictive models on network data, as inferences about one entity can be used to improve inferences about related entities. In this paper, we propose a data mining method that explicitly considers autocorrelation when building regression models from network data. The method is based on the concept of predictive clustering trees (PCTs), which can be used both for clustering and predictive tasks: PCTs are decision trees viewed as hierarchies of clusters and provide symbolic descriptions of the clusters. In addition, PCTs can be used for multi-objective prediction problems, including multi-target regression and multi-target classification. Empirical results on real world problems of network regression show that the proposed extension of PCTs performs better than traditional decision tree induction when autocorrelation is present in the data
The potential usage of zeolites as adsorbents for the removal of organic molecules from water was investigated in a series of experiments with aqueous solutions of lower alcohols. This could represent a simple solution to the problem of cleaning up industrial wastewater as well as recovering valuable chemicals at relatively low costs. Adsorption isotherms of the Langmuir type were applied, and calculations showed that the amount of propanol adsorbed on silicalite corresponded to approximately 70% of the pore volume. The adsorption process is simple, and recovery of the more concentrated products is easily done by heat treatment and/or at lowered pressures. Adsorption experiments with aqueous acetone showed that silicalite had approximately the same adsorption capacity for acetone as for n-propanol. Heats of adsorption were determined calorimetrically
Network data describe entities represented by nodes, which may be connected with (related to) each other by edges.Many network datasets are characterized by a form of autocorrelation, where the value of a variable at a given node depends on the values of variables at the nodes it is connected with. This phenomenon is a direct violation of the assumption that data are independently and identically distributed. At the same time, it offers an unique opportunity to improve the performance of predictive models on network data, as inferences about one entity can be used to improve inferences about related entities. Regression inference in network data is a challenging task. While many approaches for network classification exist, there are very few approaches for network regression. In this paper, we propose a data mining algorithm, calledNCLUS, that explicitly considers autocorrelationwhen building regression models from network data. The algorithm is based on the concept of predictive clustering trees (PCTs) that can be used for clustering, prediction and multitarget prediction, including multi-target regression and multi-target classification.We evaluate our approach on several real world problems of network regression, coming from the areas of social and spatial networks. Empirical results showthat our algorithm performs better than PCTs learned by completely disregarding network information, as well as PCTs that are tailored for spatial data, but do not take autocorrelation into account, and a variety of other existing approache
Spatial autocorrelation is the correlation among data values, strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence - a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored. In this paper, we propose a data mining method that explicitly considers autocorrelation when building the models. The method is based on the concept of predictive clustering trees (PCTs). The proposed approach combines the possibility of capturing both global and local effects and dealing with positive spatial autocorrelation. The discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions. Results show the effectiveness of the proposed solution
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.