Given a large dimensional input and output space, even simple regression is prohibitively costly. Dimensionality reduction in the output space is important for efficient learning and prediction as modern paradigms, e.g. topic modelling, image classification, etc., have extremely large output spaces. In contrast to input dimensionality reduction, dimension reduction in output side is complicated. We propose, mutual information based output dimensionality reduction, that takes into account the relationship between the input and the output which is essential for regression and classification problems. Our method selects those labels to form the compressed label space that typically have the maximum mutual information with the input. Selecting the best subset is computationally hard, but we provide a polynomial time algorithm with provable approximation guarantee. We conduct experiments on seven multi-label classification datasets. Results show our method performs better than existing methods on some datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.