Abstract. We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation.
Nm23-H1 is one of the most interesting candidate genes for a relevant role in Neuroblastoma pathogenesis. H-Prune is the most characterized Nm23-H1 binding partner, and its overexpression has been shown in different human cancers. Our study focuses on the role of the Nm23-H1/h-Prune protein complex in Neuroblastoma. Using NMR spectroscopy, we performed a conformational analysis of the h-Prune C-terminal to identify the amino acids involved in the interaction with Nm23-H1. We developed a competitive permeable peptide (CPP) to impair the formation of the Nm23-H1/h-Prune complex and demonstrated that CPP causes impairment of cell motility, substantial impairment of tumor growth and metastases formation. Meta-analysis performed on three Neuroblastoma cohorts showed Nm23-H1 as the gene highly associated to Neuroblastoma aggressiveness. We also identified two other proteins (PTPRA and TRIM22) with expression levels significantly affected by CPP. These data suggest a new avenue for potential clinical application of CPP in Neuroblastoma treatment.
C. difficile infection is associated with disturbed gut microbiota and changes in relative frequencies and abundance of individual bacterial taxons have been described. In this study we have analysed bacterial, fungal and archaeal microbiota by denaturing high pressure liquid chromatography (DHPLC) and with machine learning methods in 208 faecal samples from healthy volunteers and in routine samples with requested C. difficile testing. The latter were further divided according to stool consistency, C. difficile presence or absence and C. difficile ribotype (027 or non-027). Lower microbiota diversity was a common trait of all routine samples and not necessarily connected only to C. difficile colonisation. Differences between the healthy donors and C. difficile positive routine samples were detected in bacterial, fungal and archaeal components. Bifidobacterium longum was the single most important species associated with C. difficile negative samples. However, by machine learning approaches we have identified patterns of microbiota composition predictive for C. difficile colonization. Those patterns also differed between samples with C. difficile ribotype 027 and other C. difficile ribotypes. The results indicate that not only the presence of a single species/group is important but that certain combinations of gut microbes are associated with C. difficile carriage and that some ribotypes (027) might be associated with more disturbed microbiota than the others.
A constant and controlled level of emission of carbon and other gases into the atmosphere is a pre-condition for preventing global warming and an essential issue for a sustainable world. Fires in the natural environment are phenomena that extensively increase the level of greenhouse emissions and disturb the normal functioning of natural ecosystems. Therefore, estimating the risk of fire outbreaks and fire prevention are the first steps in reducing the damage caused by fire. In this study, we build predictive models to estimate the risk of fire outbreaks in Slovenia, using data from a GIS, Remote Sensing imagery and the weather prediction model ALADIN.The study is carried out on three datasets, from three regions: one for the Kras region, one for the coastal region and one for continental Slovenia. On these datasets, we apply both classical statistical approaches and state-of-the-art data mining algorithms, such as ensembles of decision trees, in order to obtain predictive models of fire outbreaks.Responsible editor: Katharina Morik, Kanishka Bhaduri and Hillol Kargupta. This paper has its origins in a project report ) and a short conference paper (Stojanova et al. 2006) that introduced the problem of forest fire prediction in Slovenia, using GIS, RS and meteorological data. However, this paper significantly extends and upgrades the work presented there. In particular: We consider a wider set of data mining techniques, from single classifiers to ensembles; We present a comparison of the predictive performance in terms of several frequently used evaluation measures for classification; We present an example of the results obtained from the modeling task in the form of decision rules, explain and interpret their meaning; We generate geographical maps and compare them with other fire prediction models (e.g., FWI fire risk danger maps) provided by other services.
Abstract. The two most commonly addressed data mining tasks are predictive modelling and clustering. Here we address the task of predictive clustering, which contains elements of both and generalizes them to some extent. We propose a novel approach to predictive clustering called predictive clustering rules, present an initial implementation and its preliminary experimental evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.