Raman Arora scite author profile

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.

show abstract

Stochastic optimization for PCA and PLS

Arora

et al. 2012

View full text Add to dashboard Cite

show abstract

Unsupervised learning of acoustic features via deep canonical correlation analysis

et al. 2015

View full text Add to dashboard Cite

It has been previously shown that, when both acoustic and articulatory training data are available, it is possible to improve phonetic recognition accuracy by learning acoustic features from this multiview data with canonical correlation analysis (CCA). In contrast with previous work based on linear or kernel CCA, we use the recently proposed deep CCA, where the functional form of the feature mapping is a deep neural network. We apply the approach on a speakerindependent phonetic recognition task using data from the University of Wisconsin X-ray Microbeam Database. Using a tandem-style recognizer on this task, deep CCA features improve over earlier multiview approaches as well as over articulatory inversion and typical neural network-based tandem features. We also present a new stochastic training approach for deep CCA, which produces both faster training and better-performing features.Index Terms-multi-view learning, neural networks, deep canonical correlation analysis, XRMB, articulatory measurements INTRODUCTIONModern speech recognizers often use deep neural networks (DNNs) trained to predict the posterior probabilities of phonetic states [1]. In the two most common approaches, either (1) the DNN outputs are scaled by the state priors and used as an observation model in a hidden Markov model (HMM)-based recognizer (the hybrid approach [2]) or (2) the outputs of some layer of the network (possibly a narrow "bottleneck" layer or the final layer) are post-processed and used as acoustic features in an HMM system with a Gaussian mixture model (GMM) observation distribution (the tandem approach [3]). Working within the tandem approach, we investigate whether we can learn better DNN-based acoustic features via unsupervised learning using an external set of unlabeled multi-view data, in our case simultaneously recorded audio and articulatory measurements.The idea of feature learning using multi-view data has been explored previously using canonical correlation analysis (CCA) [4] and its nonlinear extension kernel CCA (KCCA) [5,6]. Here we propose to use the recently devloped deep CCA (DCCA) approach, which differs from linear/kernel CCA in that the feature mapping is implemented with a DNN rather than a linear/kernel function. Considering the earlier successes of CCA/KCCA, and the general success of DNNs for speech tasks, it is a natural question whether multi-view feature learning can benefit from the more flexible functional form of a DNN. We investigate this question, using data from the University of Wisconsin X-ray Microbeam Database (XRMB) [7], on speakerindependent phonetic recognition in a setting where no articulatory data is available for the recognizer training speakers. We find that DCCA indeed improves over previous CCA-based features, as well

show abstract

Multiview LSA: Representation Learning via Generalized CCA

Rastogi

Durme²,

Arora³

2015

117

View full text Add to dashboard Cite

Multiview LSA (MVLSA) is a generalization of Latent Semantic Analysis (LSA) that supports the fusion of arbitrary views of data and relies on Generalized Canonical Correlation Analysis (GCCA). We present an algorithm for fast approximate computation of GCCA, which when coupled with methods for handling missing values, is general enough to approximate some recent algorithms for inducing vector representations of words. Experiments across a comprehensive collection of test-sets show our approach to be competitive with the state of the art.

show abstract

Deep Generalized Canonical Correlation Analysis

Benton¹,

Khayrallah²,

Gujral³

et al. 2019

View full text Add to dashboard Cite

We present Deep Generalized Canonical Correlation Analysis (DGCCA) -a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two-view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA is the first CCA-style multiview representation learning technique that combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many independent sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn DGCCA representations on two distinct datasets for three downstream tasks: phonetic transcription from acoustic and articulatory measurements, and recommending hashtags and friends on a dataset of Twitter users. We find that DGCCA representations soundly beat existing methods at phonetic transcription and hashtag recommendation, and in general perform no worse than standard linear many-view techniques.

show abstract

Stochastic optimization for deep CCA via nonlinear orthogonal iterations

Wang

Arora

Livescu

et al. 2015

View full text Add to dashboard Cite

Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains. However, stochastic optimization of the deep CCA objective is not straightforward, because it does not decouple over training examples. Previous optimizers for deep CCA are either batch-based algorithms or stochastic optimization using large minibatches, which can have high memory consumption. In this paper, we tackle the problem of stochastic optimization for deep CCA with small minibatches, based on an iterative solution to the CCA objective, and show that we can achieve as good performance as previous optimizers and thus alleviate the memory requirement.

show abstract

Learning Multiview Embeddings of Twitter Users

Benton¹,

Arora²,

Dredze³

2016

View full text Add to dashboard Cite

Low-dimensional vector representations are widely used as stand-ins for the text of words, sentences, and entire documents. These embeddings are used to identify similar words or make predictions about documents. In this work, we consider embeddings for social media users and demonstrate that these can be used to identify users who behave similarly or to predict attributes of users. In order to capture information from all aspects of a user's online life, we take a multiview approach, applying a weighted variant of Generalized Canonical Correlation Analysis (GCCA) to a collection of over 100,000 Twitter users. We demonstrate the utility of these multiview embeddings on three downstream tasks: user engagement, friend selection, and demographic attribute prediction.

show abstract

NeuroSpeech: An open-source software for Parkinson's speech analysis

Orozco-Arroyave

Vásquez-Correa

Vargas-Bonilla

et al. 2018

Digital Signal Processing

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Raman Arora

Enter the Matrix: Factorization Uncovers Knowledge from Omics

Stochastic optimization for PCA and PLS

Unsupervised learning of acoustic features via deep canonical correlation analysis

Multiview LSA: Representation Learning via Generalized CCA

Deep Generalized Canonical Correlation Analysis

Stochastic optimization for deep CCA via nonlinear orthogonal iterations

Learning Multiview Embeddings of Twitter Users

NeuroSpeech: An open-source software for Parkinson's speech analysis

Contact Info

Product

Resources

About