Scalable Tensor Mining

2016

IJMS

Self Cite

How can complex relationships among molecular or clinico-pathological entities of neurological disorders be represented and analyzed? Graphs seem to be the current answer to the question no matter the type of information: molecular data, brain images or neural signals. We review a wide spectrum of graph representation and graph analysis methods and their application in the study of both the genomic level and the phenotypic level of the neurological disorder. We find numerous research works that create, process and analyze graphs formed from one or a few data types to gain an understanding of specific aspects of the neurological disorders. Furthermore, with the increasing number of data of various types becoming available for neurological disorders, we find that integrative analysis approaches that combine several types of data are being recognized as a way to gain a global understanding of the diseases. Although there are still not many integrative analyses of graphs due to the complexity in analysis, multi-layer graph analysis is a promising framework that can incorporate various data types. We describe and discuss the benefits of the multi-layer graph framework for studies of neurological disease.

Section: Need For Integrative Analysis On Large Graphsmentioning

confidence: 99%

Review on Graph Clustering and Subgraph Similarity Based Analysis of Neurological Disorders

Thomas

Seo

2016

IJMS

Self Cite

“…In case of P-TUCKER-APPROX (step 5 and lines 5-6), P-TUCKER-APPROX removes "noisy" entries of G by Algorithm 4 explained in Section III-C. P-TUCKER stops iterations if the error converges or the maximum iteration is reached (line 7). Finally, P-TUCKER performs QR decomposition on all A (n) to make them orthogonal and updates G (step 6 and lines [8][9][10][11]. Specifically, QR decomposition [25] on each A (n) is defined as follows:…”

Section: Proposed Methodsmentioning

confidence: 99%

“…Examples of such data include item ratings [1], social network [2], and web search logs [3] where most entries are missing. Tensor factorization has been used effectively for analyzing tensors [4], [5], [6], [7], [8], [9], [10]. Among tensor factorization methods [11], Tucker factorization has received much interest since it is a generalized form of other factorization methods like CANDECOMP/PARAFAC (CP) decomposition, and it allows us to examine not only latent factors but also relations hidden in tensors.…”

Section: Introductionmentioning

confidence: 99%

Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries

Park

2018 IEEE 34th International Conference on Data Engineering (ICDE)

et al. 2018

Self Cite

Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most Tucker factorization algorithms regard and estimate missing entries as zeros, which triggers a highly inaccurate decomposition. Moreover, few methods focusing on an accuracy exhibit limited scalability since they require huge memory and heavy computational costs while updating factor matrices.In this paper, we propose P-TUCKER, a scalable Tucker factorization method for sparse tensors. P-TUCKER performs alternating least squares with a row-wise update rule in a fully parallel way, which significantly reduces memory requirements for updating factor matrices. Furthermore, we offer two variants of P-TUCKER: a caching algorithm P-TUCKER-CACHE and an approximation algorithm P-TUCKER-APPROX, both of which accelerate the update process. Experimental results show that P-TUCKER exhibits 1.7-14.1× speed-up and 1.4-4.8× less error compared to the state-of-the-art. In addition, P-TUCKER scales near linearly with the number of observable entries in a tensor and number of threads. Thanks to P-TUCKER, we successfully discover hidden concepts and relations in a large-scale real-world tensor, while existing methods cannot reveal latent features due to their limited scalability or low accuracy.

“…Thus, there is a need for multi-platform data analysis method that can scalably stratify multiple cancer types for knowledge discovery and predict clinical outcomes for enabling personalized medicine. Related works in tensor analysis: Tensors, i.e., multi-dimensional arrays, are a natural representation of multi-platform genomic data [22]. The core of tensor analysis is tensor decomposition, which can be considered as higher-order singular value decomposition (HOSVD).…”

Section: Introductionmentioning

confidence: 99%

SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling

Choi

IEEE/ACM Trans. Comput. Biol. and Bioinf.

2020

Motivation: How do we integratively analyze large-scale multiplatform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically? Method: To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method we call SNeCT. SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to tensor constructed from large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. Results: The decomposed factor matrices are applied to stratify cancers, to search for top-k similar patients, and to illustrate how the matrices can be used for personalized interpretation. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The thirteen subclasses have a high correlation to tissue of origin in addition to other interesting observations, such as clear separation of OV cancers to two groups, and high clinical correlation within subclusters formed in cohorts BRCA and UCEC. In the top-k search, a new patient's genomic profile is generated and searched against existing patients based on the factor matrices. The similarity of the top-k patient to the query is high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also provide an illustration of how the factor matrices can be used for interpretable personalized analysis of each patient. Availability: The code and data available at our repository 1 .