Coclustering targets on grouping the samples (e.g.,documents and users) and the features (e.g., words and ratings) simultaneously. It employs the dual relation and the bilateral information between the samples and features. In many real-world applications, data usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifold of the data space in a principled way. In this paper, we focus on improving the coclustering performance via manifold ensemble learning, which is able to maximally approximate the intrinsic manifolds of both the sample and feature spaces. To achieve this, we develop a novel coclustering algorithm called relational multimanifold coclustering based on symmetric nonnegative matrix trifactorization, which decomposes the relational data matrix into three submatrices. This method considers the intertype relationship revealed by the relational data matrix and also the intratype information reflected by the affinity matrices encoded on the sample and feature data distributions. Specifically, we assume that the intrinsic manifold of the sample or feature space lies in a convex hull of some predefined candidate manifolds. We want to learn a convex combination of them to maximally approach the desired intrinsic manifold. To optimize the objective function, the multiplicative rules are utilized to update the submatrices alternatively. In addition, both the entropic mirror descent algorithm and the coordinate descent algorithm are exploited to learn the manifold coefficient vector. Extensive experiments on documents, images, and gene expression data sets have demonstrated the superiority of the proposed algorithm compared with other well-established methods.
Document summarization is of great value to many real world applications, such as snippets generation for search results and news headlines generation. Traditionally, document summarization is implemented by extracting sentences that cover the main topics of a document with a minimum redundancy. In this paper, we take a different perspective from data reconstruction and propose a novel framework named Document Summarization based on Data Reconstruction (DSDR). Specifically, our approach generates a summary which consist of those sentences that can best reconstruct the original document. To model the relationship among sentences, we introduce two objective functions: (1) linear reconstruction, which approximates the document by linear combinations of the selected sentences; (2) nonnegative linear reconstruction, which allows only additive, not subtractive, linear combinations. In this framework, the reconstruction error becomes a natural criterion for measuring the quality of the summary. For each objective function, we develop an efficient algorithm to solve the corresponding optimization problem. Extensive experiments on summarization benchmark data sets DUC 2006 and DUC 2007 demonstrate the effectiveness of our proposed approach.
Co-clustering targets on grouping the samples and features simultaneously. It takes advantage of the duality between the samples and features. In many real-world applications, the data points or features usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifolds in a principled way. In this study, we focus on improving the co-clustering performance via manifold ensemble learning, which aims to maximally approximate the intrinsic manifolds of both the sample and feature spaces. To achieve this, we develop a novel co-clustering algorithm called Relational Multi-manifold Coclustering (RMC) based on symmetric nonnegative matrix tri-factorization, which decomposes the relational data matrix into three matrices. This method considers the intertype relationship revealed by the relational data matrix and the intra-type information reflected by the affinity matrices. Specifically, we assume the intrinsic manifold of the sample or feature space lies in a convex hull of a group of pre-defined candidate manifolds. We hope to learn an appropriate convex combination of them to approach the desired intrinsic manifold. To optimize the objective, the multiplicative rules are utilized to update the factorized matrices and the entropic mirror descent algorithm is exploited to automatically learn the manifold coefficients. Experimental results demonstrate the superiority of the proposed algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.