Jan Struyf scite author profile

Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS's FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction Editor: Johannes Fürnkranz. Learn (2008) 73: 185-214 time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.

show abstract

Tree ensembles for predicting structured outputs

Kocev

Vens

Struyf

et al. 2013

Pattern Recognition

234

255

View full text Add to dashboard Cite

In this article, we address the task of learning models for predicting structured outputs. We consider both global and local prediction of structured outputs, the former based on a single model that predicts the entire output structure and the latter based on a collection of models, each predicting a component of the output structure. We use ensemble methods and apply them in the context of predicting structured outputs. We propose to build ensemble models consisting of predictive clustering trees, which generalize classification trees: these have been used for predicting different types of structured outputs, both locally and globally. More specifically, we develop methods for learning two types of ensembles (bagging and random forests) of predictive clustering trees for global and local prediction of different types of structured outputs. The types of outputs considered correspond to different predictive modelling tasks: multi-target regression, multi-target classification, and hierarchical multi-label classification. Each of the combinations can be applied both in the context of global prediction (producing a single ensemble) or local prediction (producing a collection of ensembles). We conduct an extensive experimental evaluation across a range of benchmark datasets for each of the three types of structured outputs. We compare ensembles for global and local prediction, as well as single trees for global prediction and tree collections for local prediction, both in terms of predictive performance and in terms of efficiency (running times and model complexity). The results show that both global and local tree ensembles perform better than the single model counterparts in terms of predictive power. Global and local tree ensembles perform equally well, with global ensembles being more efficient and producing smaller models, as well as needing fewer trees in the ensemble to achieve the maximal performance.

show abstract

Predicting gene function using hierarchical multi-label decision tree ensembles

et al. 2010

View full text Add to dashboard Cite

BackgroundS. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.ResultsWe study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.ConclusionsOur results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

show abstract

Ensembles of Multi-Objective Decision Trees

et al.

View full text Add to dashboard Cite

Abstract. Ensemble methods are able to improve the predictive performance of many base classifiers. Up till now, they have been applied to classifiers that predict a single target attribute. Given the non-trivial interactions that may occur among the different targets in multi-objective prediction tasks, it is unclear whether ensemble methods also improve the performance in this setting. In this paper, we consider two ensemble learning techniques, bagging and random forests, and apply them to multi-objective decision trees (MODTs), which are decision trees that predict multiple target attributes at once. We empirically investigate the performance of ensembles of MODTs. Our most important conclusions are: (1) ensembles of MODTs yield better predictive performance than MODTs, and (2) ensembles of MODTs are equally good, or better than ensembles of single-objective decision trees, i.e., a set of ensembles for each target. Moreover, ensembles of MODTs have smaller model size and are faster to learn than ensembles of single-objective decision trees.

show abstract

Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics

Blockeel

Schietgat

Struyf

et al. 2006

View full text Add to dashboard Cite

Abstract. Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes organized in a hierarchy. The task is relevant for several application domains. This paper presents an empirical study of decision tree approaches to HMC in the area of functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to learning a set of regular classification trees (one for each class). Interestingly, on all 12 datasets we use, the HMC tree wins on all fronts: it is faster to learn and to apply, easier to interpret, and has similar or better predictive performance than the set of regular trees. It turns out that HMC tree learning is more robust to overfitting than regular tree learning.

show abstract

Constraint Based Induction of Multi-objective Regression Trees

Struyf

Džeroski

2006

View full text Add to dashboard Cite

Abstract. Constrained based inductive systems are a key component of inductive databases and responsible for building the models that satisfy the constraints in the inductive queries. In this paper, we propose a constraint based system for building multi-objective regression trees. A multi-objective regression tree is a decision tree capable of predicting several numeric variables at once. We focus on size and accuracy constraints. By either specifying maximum size or minimum accuracy, the user can trade-off size (and thus interpretability) for accuracy. Our approach is to first build a large tree based on the training data and to prune it in a second step to satisfy the user constraints. This has the advantage that the tree can be stored in the inductive database and used for answering inductive queries with different constraints. Besides size and accuracy constraints, we also briefly discuss syntactic constraints. We evaluate our system on a number of real world data sets and measure the size versus accuracy trade-off.

show abstract

CP-Logic Theory Inference with Contextual Variable Elimination and Comparison to BDD Based Inference Methods

Meert

Struyf

Blockeel

2010

View full text Add to dashboard Cite

There is a growing interest in languages that combine probabilistic models with logic to represent complex domains involving uncertainty. Causal probabilistic logic (CP-logic), which has been designed to model causal processes, is such a probabilistic logic language. This paper investigates inference algorithms for CP-logic; these are crucial for developing learning algorithms. It proposes a new CP-logic inference method based on contextual variable elimination and compares this method to variable elimination and to methods based on binary decision diagrams.

show abstract

Learning Predictive Clustering Rules

Ženko

Džeroski

Struyf

2006

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jan Struyf

Decision trees for hierarchical multi-label classification

Tree ensembles for predicting structured outputs

Predicting gene function using hierarchical multi-label decision tree ensembles

Ensembles of Multi-Objective Decision Trees

Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics

Constraint Based Induction of Multi-objective Regression Trees

CP-Logic Theory Inference with Contextual Variable Elimination and Comparison to BDD Based Inference Methods

Learning Predictive Clustering Rules

Contact Info

Product

Resources

About