Tejaswini Mallavarapu scite author profile

Background: Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis. Results: We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified. Conclusions: Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet.

show abstract

Hi-LASSO: High-Dimensional LASSO

Youngsoon

Hao

Mallavarapu

et al. 2019

IEEE Access

View full text Add to dashboard Cite

High-throughput genomic technologies are leading to a paradigm shift in research of computational biology. Computational analysis with high-dimensional data and its interpretation are essential for the understanding of complex biological systems. Most biological data (e.g., gene expression and DNA sequence data) are high-dimensional, but consist of much fewer samples than predictors. Such high-dimension, low sample size (HDLSS) data often cause computational challenges in biological data analysis. A number of least absolute shrinkage and selection operator (LASSO) methods have been widely used for identifying biomarkers or prognostic factors in the field of bioinformatics. The LASSO solution has been improved through the development of the LASSO derivatives, including elastic-net, adaptive LASSO, relaxed LASSO, VISA, random LASSO, and recursive LASSO. However, there are several known limitations of the existing LASSO solutions: multicollinearity (particularly with different signs), subset size limitation, and the lack of the statistical test of significance. We propose a high-dimensional LASSO (Hi-LASSO) that theoretically improves a LASSO model providing better performance of both prediction and feature selection on extremely high-dimensional data. The Hi-LASSO alleviates bias introduced from bootstrapping, refines importance scores, improves the performance taking advantage of global oracle property, provides a statistical strategy to determine the number of bootstrapping, and allows tests of significance for feature selection with appropriate distribution. The performance of Hi-LASSO was assessed by comparing the existing state-of-the-art LASSO methods in extensive simulation experiments with multiple data settings. The Hi-LASSO was also applied for survival analysis with GBM gene expression data.

show abstract

Pathway-based deep clustering for molecular subtyping of cancer

Mallavarapu

Hao

Youngsoon

et al. 2020

Methods

View full text Add to dashboard Cite

Cancer is a genetic disease comprising multiple subtypes that have distinct molecular characteristics and clinical features. Cancer subtyping helps in improving personalized treatment and making decision, as different cancer subtypes respond differently to the treatment. The increasing availability of cancer related genomic data provides the opportunity to identify molecular subtypes. Several unsupervised machine learning techniques have been applied on molecular data of the tumor samples to identify cancer subtypes that are genetically and clinically distinct. However, most clustering methods often fail to efficiently cluster patients due to the challenges imposed by high-throughput genomic data and its non-linearity. In this paper, we propose a pathway-based deep clustering method (PACL) for molecular subtyping of cancer, which incorporates gene expression and biological pathway database to group patients into cancer subtypes. The main contribution of our model is to discover high-level representations of biological data by learning complex hierarchical and nonlinear effects of pathways. We compared the performance of our model with a number of benchmark clustering methods that recently have been proposed in cancer subtypes. We assessed the hypothesis that clusters (subtypes) may be associated to different survivals by logrank tests. PACL showed the lowest p-value of the logrank test against the benchmark methods. It demonstrates the patient groups clustered by PACL may correspond to subtypes which are significantly associated with distinct survival distributions. Moreover, PACL provides a solution to comprehensively identify subtypes and interpret the model in the biological pathway level. The open-source software of PACL in PyTorch is publicly available at https://github.com/tmallava/PACL.

show abstract

Cox-PASNet: Pathway-based Sparse Deep Neural Network for Survival Analysis

Hao

Youngsoon

Mallavarapu

et al. 2018

View full text Add to dashboard Cite

R-PathCluster: Identifying cancer subtype of glioblastoma multiforme using pathway-based restricted boltzmann machine

Mallavarapu

Youngsoon

et al. 2017

View full text Add to dashboard Cite

A federated approach for fine-grained classification of fashion apparel

Mallavarapu

Cranfill

Kim

et al. 2021

Machine Learning with Applications

View full text Add to dashboard Cite

A Federated Approach for Fine-Grained Classification of Fashion Apparel

Mallavarapu¹,

Cranfill²,

Son³

et al. 2020

Preprint

View full text Add to dashboard Cite

As online retail services proliferate and are pervasive in modern lives, applications for classifying fashion apparel features from image data are becoming more indispensable. Online retailers, from leading companies to start-ups, can leverage such applications in order to increase profit margin and enhance the consumer experience. Many notable schemes have been proposed to classify fashion items, however, the majority of which focused upon classifying basic-level categories, such as T-shirts, pants, skirts, shoes, bags, and so forth. In contrast to most prior efforts, this paper aims to enable an in-depth classification of fashion item attributes within the same category. Beginning with a single dress, we seek to classify the type of dress hem, the hem length, and the sleeve length. The proposed scheme is comprised of three major stages: (a) localization of a target item from an input image using semantic segmentation, (b) detection of human key points (e.g., point of shoulder) using a pre-trained CNN and a bounding box, and (c) three phases to classify the attributes using a combination of algorithmic approaches and deep neural networks. The experimental results demonstrate that the proposed scheme is highly effective, with all categories having average precision of above 93.02%, and outperforms existing Convolutional Neural Networks (CNNs)-based schemes.

show abstract

PASCL: Pathway-based Sparse Deep Clustering for Identifying Unknown Cancer Subtypes

Mallavarapu

Hao

Youngsoon

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.