2015
DOI: 10.1021/acs.jctc.5b00618
|View full text |Cite
|
Sign up to set email alerts
|

Weighted Distance Functions Improve Analysis of High-Dimensional Data: Application to Molecular Dynamics Simulations

Abstract: Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
14
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(18 citation statements)
references
References 67 publications
4
14
0
Order By: Relevance
“…Especially so when compared to k-means (triangle), PCA and the auto encoder which are not as strongly affected and are more stable over varying dimensionalities. These results are consistent with 9 which shows that tICA is prone to larger errors when increasing dimensionality than other methods.…”
Section: Villinsupporting
confidence: 89%
See 1 more Smart Citation
“…Especially so when compared to k-means (triangle), PCA and the auto encoder which are not as strongly affected and are more stable over varying dimensionalities. These results are consistent with 9 which shows that tICA is prone to larger errors when increasing dimensionality than other methods.…”
Section: Villinsupporting
confidence: 89%
“…5 Sparse coding, 6 auto encoders 7 and neighborhood embedding 8 have shown to be very effective in reducing the dimensionality of data while preserving important underlying features. Dimensionality reduction methods have also been developed specifically for molecular dynamics data by reweighing features with unsupervised methods, 9 by learning distance functions 10 and by using diffusion maps. 11 In this work we focus on comparing the performance of dimensionality reduction methods on biological simulation data.…”
Section: Introductionmentioning
confidence: 99%
“…56,57 This can happen either in post-processing or directly at the level of feature selection and transformation. While the use of time-based information changes the results for Beta3S only slightly, those for BPTI are dramatically different, 45 which we find here in similar form for 3CL pro , compare Fig. 7 to Fig.…”
Section: Discussionsupporting
confidence: 79%
“…The choice and processing of features are the most critical steps in understanding MD data, and the impact of these steps on the inferences drawn remains the biggest caveat in the field. 28,45 during earlier stages of the work as well as Davide Garolini for interesting discussions and for the development of the R package 'CampaRi'. This work was supported financially by an excellence grant of the Swiss National Science Foundation (31003A_169007) to AC.…”
Section: Discussionmentioning
confidence: 99%
“…An alternative to the preprocessing of the combined dataset of TCs and band powers is offered by the adoption of locally adaptive weights (LAWs; Blöchliger et al, 2015 ). These weights are meant to correct for the fact that different time series may be of heterogeneous importance when the system is in different states.…”
Section: Methodsmentioning
confidence: 99%