BackgroundHigh-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible.ResultsWe present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1565-4) contains supplementary material, which is available to authorized users.
Two different approaches to parameter estimation (PE) in the context of polymerization are introduced, refined, combined, and applied. The first is classical PE where one is interested in finding parameters which minimize the distance between the output of a chemical model and experimental data. The second is Bayesian PE allowing for quantifying parameter uncertainty caused by experimental measurement error and model imperfection. Based on detailed descriptions of motivation, theoretical background, and methodological aspects for both approaches, their relation are outlined. The main aim of this article is to show how the two approaches complement each other and can be used together to generate strong information gain regarding the model and its parameters. Both approaches and their interplay in application to polymerization reaction systems are illustrated. This is the first part in a two‐article series on parameter estimation for polymer reaction kinetics with a focus on theory and methodology while in the second part a more complex example will be considered.
We present a numerical method to model dynamical systems from data. We use the recently introduced method Scalable Probabilistic Approximation (SPA) to project points from a Euclidean space to convex polytopes and represent these projected states of a system in new, lower-dimensional coordinates denoting their position in the polytope. We then introduce a specific nonlinear transformation to construct a model of the dynamics in the polytope and to transform back into the original state space. To overcome the potential loss of information from the projection to a lower-dimensional polytope, we use memory in the sense of the delayembedding theorem of Takens. By construction, our method produces stable models. We illustrate the capacity of the method to reproduce even chaotic dynamics and attractors with multiple connected components on various examples.
The reactivity ratios of acrylic acid (AA, M1) and its dimer β‐acroyloxypropionic acid (diAA, M2) are determined from cumulative copolymerization data by two different methods: classical parameter estimation (PE) by minimizing the objective function and a Bayesian analysis. Classical PE gives r1=0.74 and r2=1.23 at the minimum of the residual. From the Bayesian analysis, the probability distribution of the parameter sets is obtained, revealing the existence of parameter sets with rather the same probability. The influence of the number of data and the size of the measurement error are discussed.
Two different approaches to parameter estimation (PE) in the context of polymerization are introduced, refined, combined, and applied. The first is classical PE where one is interested in finding parameters which minimize the distance between the output of a chemical model and experimental data. The second is Bayesian PE allowing for quantifying parameter uncertainty caused by experimental measurement error and model imperfection. Based on detailed descriptions of motivation, theoretical background, and methodological aspects for both approaches, their relation are outlined. The main aim of this article is to show how the two approaches complement each other and can be used together to generate strong information gain regarding the model and its parameters. Both approaches and their interplay in application to polymerization reaction systems are illustrated. This is the first part in a two-article series on parameter estimation for polymer reaction kinetics with a focus on theory and methodology while in the second part a more complex example will be considered.
We investigate opinion dynamics based on an agent-based model and are interested in predicting the evolution of the percentages of the entire agent population that share an opinion. Since these opinion percentages can be seen as an aggregated observation of the full system state, the individual opinions of each agent, we view this in the framework of the Mori–Zwanzig projection formalism. More specifically, we show how to estimate a nonlinear autoregressive model (NAR) with memory from data given by a time series of opinion percentages, and discuss its prediction capacities for various specific topologies of the agent interaction network. We demonstrate that the inclusion of memory terms significantly improves the prediction quality on examples with different network topologies.
A statistical, data-driven method is presented that quantifies influences between variables of a dynamical system. The method is based on finding a suitable representation of points by fuzzy affiliations with respect to landmark points using the Scalable Probabilistic Approximation algorithm. This is followed by the construction of a linear mapping between these affiliations for different variables and forward in time. This linear mapping, or matrix, can be directly interpreted in light of unidirectional dependencies, and relevant properties of it are quantified. These quantifications, given by the sum of singular values and the average row variance of the matrix, then serve as measures for the influences between variables of the dynamics. The validity of the method is demonstrated with theoretical results and on several numerical examples, covering deterministic, stochastic, and delayed types of dynamics. Moreover, the method is applied to a non-classical example given by real-world basketball player movement, which exhibits highly random movement and comes without a physical intuition, contrary to many examples from, e.g., life sciences.
We introduced the mixed-methods Data-Powered Positive Deviance (DPPD) framework as a potential addition to the set of tools used to search for effective response strategies against the SARS-CoV-2 pandemic. For this purpose, we conducted a DPPD study in the context of the early stages of the German SARS-CoV-2 pandemic. We used a framework of scalable quantitative methods to identify positively deviant German districts that is novel in the scientific literature on DPPD, and subsequently employed qualitative methods to identify factors that might have contributed to their comparatively successful reduction of the forward transmission rate. Our qualitative analysis suggests that quick, proactive, decisive, and flexible/pragmatic actions, the willingness to take risks and deviate from standard procedures, good information flows both in terms of data collection and public communication, alongside the utilization of social network effects were deemed highly important by the interviewed districts. Our study design with its small qualitative sample constitutes an exploratory and illustrative effort and hence does not allow for a clear causal link to be established. Thus, the results cannot necessarily be extrapolated to other districts as is. However, the findings indicate areas for further research to assess these strategies’ effectiveness in a broader study setting. We conclude by stressing DPPD’s strengths regarding replicability, scalability, adaptability, as well as its focus on local solutions, which make it a promising framework to be applied in various contexts, e.g., in the context of the Global South.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.