The Koopman operator is a linear but infinite dimensional operator that governs the evolution of scalar observables defined on the state space of an autonomous dynamical system, and is a powerful tool for the analysis and decomposition of nonlinear dynamical systems. In this manuscript, we present a data driven method for approximating the leading eigenvalues, eigenfunctions, and modes of the Koopman operator. The method requires a data set of snapshot pairs and a dictionary of scalar observables, but does not require explicit governing equations or interaction with a "black box" integrator. We will show that this approach is, in effect, an extension of Dynamic Mode Decomposition (DMD), which has been used to approximate the Koopman eigenvalues and modes. Furthermore, if the data provided to the method are generated by a Markov process instead of a deterministic dynamical system, the algorithm approximates the eigenfunctions of the Kolmogorov backward equation, which could be considered as the "stochastic Koopman operator" [1]. Finally, four illustrative examples are presented: two that highlight the quantitative performance of the method when presented with either deterministic or stochastic data, and two that show potential applications of the Koopman eigenfunctions.
A central problem in data analysis is the low dimensional representation of high dimensional data and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, important problems are the identification of slow variables and dynamically meaningful reaction coordinates that capture the long time evolution of the system. In this paper we provide a unifying view of these apparently different tasks, by considering a family of diffusion maps, defined as the embedding of complex (high dimensional) data onto a low dimensional Euclidean space, via the eigenvectors of suitably defined random walks defined on the given datasets. Assuming that the data is randomly sampled from an underlying general probability distribution p(x) = e −U(x) , we show that as the number of samples goes to infinity, the eigenvectors of each diffusion map converge to the eigenfunctions of a corresponding differential operator defined on the support of the probability distribution. Different normalizations of the Markov chain on the graph lead to different limiting differential operators. Specifically, the normalized graph Laplacian leads to a backward Fokker-Planck operator with an underlying potential of 2U(x), best suited for spectral clustering. A different anisotropic normalization of the random walk leads to the backward Fokker-Planck operator with the potential U(x), best suited for the analysis of the long time asymptotics of high dimensional stochastic systems governed by a stochastic differential equation with the same potential U(x). Finally, yet another normalization leads to the eigenfunctions of the Laplace-Beltrami (heat) operator on the manifold in which the data resides, best suited for the analysis of the geometry of the dataset regardless of its possibly non-uniform density.
Abstract.The concise representation of complex high dimensional stochastic systems via a few reduced coordinates is an important problem in computational physics, chemistry and biology. In this paper we use the first few eigenfunctions of the backward Fokker-Planck diffusion operator as a coarse grained low dimensional representation for the long term evolution of a stochastic system, and show that they are optimal under a certain mean squared error criterion. We denote the mapping from physical space to these eigenfunctions as the diffusion map. While in high dimensional systems these eigenfunctions are difficult to compute numerically by conventional methods such as finite differences or finite elements, we describe a simple computational data-driven method to approximate them from a large set of simulated data. Our method is based on defining an appropriately weighted graph on the set of simulated data, and computing the first few eigenvectors and eigenvalues of the corresponding random walk matrix on this graph. Thus, our algorithm incorporates the local geometry and density at each point into a global picture that merges in a natural way data from different simulation runs. Furthermore, we describe lifting and restriction operators between the diffusion map space and the original space. These operators facilitate the description of the coarse-grained dynamics, possibly in the form of a low-dimensional effective free energy surface parameterized by the diffusion map reduction coordinates. They also enable a systematic exploration of such effective free energy surfaces through the design of additional "intelligently biased" computational experiments. We conclude by demonstrating our method on a few examples.Key words. Diffusion maps, dimensional reduction, stochastic dynamical systems, Fokker Planck operator, metastable states, normalized graph Laplacian. AMS subject classifications. 60H10, 60J60, 62M051. Introduction. Systems of stochastic differential equations (SDE's) are commonly used as models for the time evolution of many chemical, physical and biological systems of interacting particles [22,45,52]. There are two main approaches to the study of such systems. The first is by detailed Brownian Dynamics (BD) or other stochastic simulations, which follow the motion of each particle (or more generally variable) in the system and generate one or more long trajectories. The second is via analysis of the time evolution of the probability densities of these trajectories using the numerical solution of the corresponding time dependent Fokker-Planck (FP) partial differential equation.For typical high dimensional systems, both approaches suffer from severe limitations, when applied directly. The main limitation of standard BD simulations is the scale gap between the atomistic time scale of single particle motions, at which the SDE's are formulated, and the macroscopic time scales that characterize the long term evolution and equilibration of these systems. This scale gap puts severe constraints on detailed simulat...
We employ the diffusion map approach as a nonlinear dimensionality reduction technique to extract a dynamically relevant, low-dimensional description of n-alkane chains in the ideal-gas phase and in aqueous solution. In the case of C 8 we find the dynamics to be governed by torsional motions. For C 16 and C 24 we extract three global order parameters with which we characterize the fundamental dynamics, and determine that the low free-energy pathway of globular collapse proceeds by a "kink and slide" mechanism, whereby a bend near the end of the linear chain migrates toward the middle to form a hairpin and, ultimately, a coiled helix. The low-dimensional representation is subtly perturbed in the solvated phase relative to the ideal gas, and its geometric structure is conserved between C 16 and C 24 . The methodology is directly extensible to biomolecular self-assembly processes, such as protein folding.I t has long been suspected that cooperative couplings between degrees of freedom render the effective dimensionality of biophysical systems far less than the 3R-dimensional coordinate space of the R constituent atoms (1-5). This has been framed in the projection operator formalism (6) as a separation of time scales in which the important dynamics reside in a "slow subspace" (7) and is associated with a smooth underlying free energy surface (8). For example, two-dimensional descriptions have been formulated for dialanine (9) and a coarse-grained model of the src homology 3 domain (5).Calculation of the effective dimensionality of a dynamical system, and identification of order parameters describing the low-dimensional "intrinsic manifold" to which the system dynamics are effectively restrained, is a long-standing problem in as seemingly disparate fields as data visualization (10), speech recognition (11), semisupervised learning (12), and spectral clustering (13). The fraction of native contacts (Q) (8,14) and the folding probability (P fold ) (8, 15) have been used as reaction coordinates for protein folding, but such coarse variables may lump together structurally and kinetically disparate conformations and can prove inadequate for larger proteins with frustrated folding funnels (5, 8). Empirical order parameters also tend to perform poorly on landscapes exhibiting multiple local free-energy (FE) minima or lacking well-defined unfolded and folded basins. Principal components analysis (PCA) is a popular linear dimensionality reduction technique applied extensively to biophysical systems (1-4, 16) which seeks to describe the "essential subspace" (2) of the dynamics by a set of orthogonal vectors oriented along the directions of largest variance in the data. For the highly nonlinear intrinsic manifolds one expects for complex molecular systems (5), the linearity of this technique renders it appropriate in local regions, but results in a poor characterization of the global features (5, 17). This deficiency leads to poor PCA estimates of the effective dimensionality (17) far in excess of the dimensionality of the phas...
T he best available descriptions of systems often come at a fine level (atomistic, stochastic, microscopic, agent based), whereas the questions asked and the tasks required by the modeler (prediction, parametric analysis, optimization, and control) are at a much coarser, macroscopic level. Traditional modeling approaches start by deriving macroscopic evolution equations from microscopic models, and then bringing an arsenal of computational tools to bear on these macroscopic descriptions. Over the last few years with several collaborators, we have developed and validated a mathematically inspired, computational enabling technology that allows the modeler to perform macroscopic tasks acting on the microscopic models directly. We call this the "equation-free" approach, since it circumvents the step of obtaining accurate macroscopic descriptions. The backbone of this approach is the design of computational "experiments". In traditional numerical analysis, the main code "pings" a subroutine containing the model, and uses the returned information (time derivatives, etc.) to perform computer-assisted analysis. In our approach the same main code "pings" a subroutine that runs an ensemble of appropriately initialized computational experiments from which the same quantities are estimated. Traditional continuum numerical algorithms can, thus, be viewed as protocols for experimental design (where "experiment" means a computational experiment set up, and performed with a model at a different level of description). Ultimately, what makes it all possible is the ability to initialize computational experiments at will. Short bursts of appropriately initialized computational experimentation -through matrix-free numerical analysis, and systems theory tools like estimation-bridge microscopic simulation with macroscopic modeling. If enough control authority exists to initialize laboratory experiments "at will" this computational enabling technology can lead to experimental protocols for the equation-free exploration of complex system dynamics. The Equation-Free ApproachA persistent feature of many complex systems is the emergence of macroscopic, coherent behavior from the interactions of microscopic agents such as molecules, cells, or individuals in a population. The implication is that macroscopic rules (a description of the system at a coarse-grained, high level) can somehow be deduced from microscopic ones (a description at a much finer level). For laminar Newtonian fluid mechanics, a successful coarse-grained description (the Navier-Stokes equations) was known on a phenomenological basis long before its approximate derivation from kinetic theory. Today, we must frequently study systems for which the physics can be modeled at a microscopic, fine scale; yet, it is practically impossible to derive a good macroscopic description from the microscopic rules. Hence, we look to the computer to explore the macroscopic behavior, based on the microscopic description.
Among the most striking aspects of the movement of many animal groups are their sudden coherent changes in direction. Recent observations of locusts and starlings have shown that this directional switching is an intrinsic property of their motion. Similar direction switches are seen in self-propelled particle and other models of group motion. Comprehending the factors that determine such switches is key to understanding the movement of these groups. Here, we adopt a coarse-grained approach to the study of directional switching in a self-propelled particle model assuming an underlying one-dimensional Fokker-Planck equation for the mean velocity of the particles. We continue with this assumption in analyzing experimental data on locusts and use a similar systematic Fokker-Planck equation coefficient estimation approach to extract the relevant information for the assumed Fokker-Planck equation underlying that experimental data. In the experiment itself the motion of groups of 5 to 100 locust nymphs was investigated in a homogeneous laboratory environment, helping us to establish the intrinsic dynamics of locust marching bands. We determine the mean time between direction switches as a function of group density for the experimental data and the self-propelled particle model. This systematic approach allows us to identify key differences between the experimental data and the model, revealing that individual locusts appear to increase the randomness of their movements in response to a loss of alignment by the group. We give a quantitative description of how locusts use noise to maintain swarm alignment. We discuss further how properties of individual animal behavior, inferred by using the Fokker-Planck equation coefficient estimation approach, can be implemented in the selfpropelled particle model to replicate qualitatively the group level dynamics seen in the experimental data.collective behavior | locusts | density-dependent switching | coarse-graining | swarming W hile recent years have seen an explosion in the number of simulation models of moving animal groups, there is little detailed comparison between these models and experimental data (1, 2). The models usually produce motion that "looks like" that of a swarm of locusts, a school of fish, or a flock of birds, but the similarities are difficult to quantify (3). Furthermore, the simulation models themselves are often difficult to understand from a mathematical viewpoint since, by their nature, they resist simple mean-field descriptions. These complications make it difficult to use models to predict, for example, the rate at which groups change direction of travel or how spatial patterns evolve through time (4, 5). We are left with a multitude of models, all of which seem to relate to the available experimental data, but none of which provide clear predictive power.One approach to the problem of linking experimental data to model behavior is the detailed study of the local interactions between animals. This approach has yielded better understanding of the rules ...
We present a "coarse molecular dynamics" approach and apply it to studying the kinetics and thermodynamics of a peptide fragment dissolved in water. Short bursts of appropriately initialized simulations are used to infer the deterministic and stochastic components of the peptide motion parametrized by an appropriate set of coarse variables. Techniques from traditional numerical analysis (Newton-Raphson, coarse projective integration) are thus enabled; these techniques help analyze important features of the free-energy landscape (coarse transition states, eigenvalues and eigenvectors, transition rates, etc.). Reverse integration of (irreversible) expected coarse variables backward in time can assist escape from free energy minima and trace low-dimensional free energy surfaces. To illustrate the "coarse molecular dynamics" approach, we combine multiple short (0.5ps) replica simulations to map the free energy surface of the "alanine dipeptide" in water, and to determine the ¢ ¡ ¤ £ ¦ ¥ § ¡ © rate of interconversion between the two stable configurational basins corresponding to the -helical and extended minima.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.