Daniel M. Dunlavy scite author profile

The problem of incomplete data-i.e., data with missing or unknown values-in multi-way arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent structure of the data and possibly reconstructing missing values (i.e., tensor completion). We focus on one of the most well-known tensor factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In the presence of missing data, CP can be formulated as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factorize tensors with noise and up to 99% missing data. A unique aspect of our approach is that it scales to sparse large-scale data, e.g., 1000 × 1000 × 1000 with five million known entries (0.5% dense). We further demonstrate the usefulness of CP-WOPT on two real-world applications: a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes and the problem of modeling computer network traffic where data may be absent due to the expense of the data collection process.Keywords: missing data, incomplete data, tensor factorization, CANDECOMP, PARAFAC, optimization $ A preliminary conference version of this paper has appeared as [1].* Corresponding author Email addresses: evrim.acar@bte.tubitak.gov.tr (Evrim Acar), dmdunla@sandia.gov (Daniel M. Dunlavy), tgkolda@sandia.gov (Tamara G. Kolda), mm@imm.dtu.dk (Morten Mørup)

show abstract

A scalable optimization approach for fitting canonical tensor decompositions

Acar

Dunlavy

Kolda

2011

Journal of Chemometrics

258

412

View full text Add to dashboard Cite

Tensor decompositions are higher-order analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor decomposition, otherwise known as CANDE-COMP/PARAFAC (CP), which expresses a tensor as the sum of component rank-one tensors and is used in a multitude of applications such as chemometrics, signal processing, neuroscience and web analysis. The task of computing CP, however, can be difficult. The typical approach is based on alternating least-squares (ALS) optimization, but it is not accurate in the case of overfactoring. High accuracy can be obtained by using nonlinear least-squares (NLS) methods; the disadvantage is that NLS methods are much slower than ALS. In this paper, we propose the use of gradientbased optimization methods. We discuss the mathematical calculation of the derivatives and show that they can be computed efficiently, at the same cost as one iteration of ALS. Computational experiments demonstrate that the gradient-based optimization methods are more accurate than ALS and faster than NLS in terms of total computation time.

show abstract

Temporal Link Prediction Using Matrix and Tensor Factorizations

Dunlavy

Kolda

Acar

2011

ACM Trans. Knowl. Discov. Data

419

258

View full text Add to dashboard Cite

The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for times 1 through T , can we predict the links at time T + 1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T + 2, T + 3, etc.? In this paper, we consider bipartite graphs that evolve over time and consider matrix-and tensor-based methods for predicting future links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix-and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensor-based techniques are particularly effective for temporal data with varying periodic patterns.

show abstract

Scalable Tensor Factorizations with Missing Data

et al. 2010

View full text Add to dashboard Cite

Link Prediction on Evolving Data Using Matrix and Tensor Factorizations

2009

View full text Add to dashboard Cite

Formulations for Surrogate-Based Optimization with Data Fit, Multifidelity, and Reduced-Order Models

Eldred¹,

Dunlavy²

2006

View full text Add to dashboard Cite

, Albuquerque, NM 87185Surrogate-based optimization (SBO) methods have become established as effective techniques for engineering design problems through their ability to tame nonsmoothness and reduce computational expense. Possible surrogate modeling techniques include data fits (local, multipoint, or global), multifidelity model hierarchies, and reduced-order models, and each of these types has unique features when employed within SBO. This paper explores a number of SBO algorithmic variations and their effect for different surrogate modeling cases. First, general facilities for constraint management are explored through approximate subproblem formulations (e.g., direct surrogate), constraint relaxation techniques (e.g., homotopy), merit function selections (e.g., augmented Lagrangian), and iterate acceptance logic selections (e.g., filter methods). Second, techniques specialized to particular surrogate types are described. Computational results are presented for sets of algebraic test problems and an engineering design application solved using the DAKOTA software.

show abstract

Poblano v1.0 : a Matlab toolbox for gradient-based optimization.

Dunlavy¹,

Acar

Kolda

2010

View full text Add to dashboard Cite

We present Poblano v1.0, a Matlab toolbox for solving gradient-based unconstrained optimization problems. Poblano implements three optimization methods (nonlinear conjugate gradients, limitedmemory BFGS, and truncated Newton) that require only first order derivative information. In this paper, we describe the Poblano methods, provide numerous examples on how to use Poblano, and present results of Poblano used in solving problems from a standard test collection of unconstrained optimization problems.3

show abstract

Homotopy optimization methods for global optimization.

Dunlavy¹,

O’Leary

2005

View full text Add to dashboard Cite

We define a new method for global optimization, the Homotopy Optimization Method (HOM). This method differs from previous homotopy and continuation methods in that its aim is to find a minimizer for each of a set of values of the homotopy parameter, rather than to follow a path of minimizers. We define a second method, called HOPE, by allowing HOM to follow an ensemble of points obtained by perturbation of previous ones. We relate this new method to standard methods such as simulated annealing and show under what circumstances it is superior. We present results of extensive numerical experiments demonstrating performance of HOM and HOPE.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel M. Dunlavy

Scalable tensor factorizations for incomplete data

A scalable optimization approach for fitting canonical tensor decompositions

Temporal Link Prediction Using Matrix and Tensor Factorizations

Scalable Tensor Factorizations with Missing Data

Link Prediction on Evolving Data Using Matrix and Tensor Factorizations

Formulations for Surrogate-Based Optimization with Data Fit, Multifidelity, and Reduced-Order Models

Poblano v1.0 : a Matlab toolbox for gradient-based optimization.

Homotopy optimization methods for global optimization.

Contact Info

Product

Resources

About