Dimension reduction and visualization of large high-dimensional data via interpolation

Bae, Seunghee; Choi, Jong Youl; Qiu, Judy; Fox, Geoffrey

doi:10.1145/1851476.1851501

Cited by 44 publications

(41 citation statements)

References 21 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Authors have been researching on developing high performance visualization algorithms, such as parallel MDS and GTM and their interpolation extensions [7,8], to visualize large PubChem dataset in 3D space by using our in-house 3D data point visualization tool and now we extend its functionality to access external data sources in a dynamic way.…”

Section: Data Visualization and Remote Data Accessmentioning

confidence: 99%

Browsing large scale cheminformatics data with dimension reduction

Choi

Bae

Qiu

et al. 2010

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Self Cite

View full text Add to dashboard Cite

Visualization of large-scale high dimensional data tool is highly valuable for scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in GIS browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel MDS (Multi-Dimensional Scaling) and GTM (Generative Topographic Mapping) services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype use with Chem2Bio2RDF system using SPARQL query language to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies can be used to develop similar high dimensional browsers in other scientific areas.

show abstract

Section: Data Visualization and Remote Data Accessmentioning

confidence: 99%

Browsing large scale cheminformatics data with dimension reduction

Choi

Bae

Qiu

et al. 2010

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Also, DAexp95 results are very similar to or even better than DA-exp99 results although DA-exp95 takes shorter time than DA-exp99 case. In future work, we will integrate these ideas with the interpolation technology described in [29] to give a robust approach to dimension reduction of large datasets that scales like O(N) rather O(N 2 ) of general MDS methods.…”

Section: Discussionmentioning

confidence: 99%

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm

Bae

Qiu

Fox

2010

2010 IEEE Sixth International Conference on E-Science

Self Cite

View full text Add to dashboard Cite

Abstract-Multidimensional Scaling (MDS) is a dimension reduction method for information visualization, which is set up as a non-linear optimization problem. It is applicable to many data intensive scientific problems including studies of DNA sequences but tends to get trapped in local minima. Deterministic Annealing (DA) has been applied to many optimization problems to avoid local minima. We apply DA approach to MDS problem in this paper and show that our proposed DA approach improves the mapping quality and shows high reliability in a variety of experimental results. Further its execution time is similar to that of the un-annealed approach. We use different data sets for comparing the proposed DA approach with both a well known algorithm called SMACOF and a MDS with distance smoothing method which aims to avoid local optima. Our proposed DA method outperforms SMACOF algorithm and the distance smoothing MDS algorithm in terms of the mapping quality and shows much less sensitivity with respect to initial configurations and stopping condition. We also investigate various temperature cooling parameters for our deterministic annealing method within an exponential cooling scheme.

show abstract

“…It has applied linear discriminant analysis to the labeled objects in the representation space. In contrast to them, [7] has proposed an EM-like optimization solution, called MI-MDS to solve the problem with STRESS criteria in (26), which found embedding of approximating to the distance rather than the inner product as in CMDS. In addition to that, [6] has proposed a heuristic method, called HE-MI, to lower the time cost of MI-MDS.…”

Section: Related Workmentioning

confidence: 99%

“…MI-MDS is an iterative majorization algorithm proposed by [7] to minimize the STRESS value in (32), where all weights are assumed to be 1. It will find nearest neighbors from insample points of a given out-of-sample point � at first, denoted as = { 1 , 2 , 3 , … , } .…”

Section: A Out-of-sample Problem and Mi-mdsmentioning

confidence: 99%

See 1 more Smart Citation

A Robust and Scalable Solution for Interpolative Multidimensional Scaling with Weighting

Ruan

Fox

2013

2013 IEEE 9th International Conference on E-Science

View full text Add to dashboard Cite

Abstract-Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pairwise Clustering, and to classify the data. We have shown that an interpolative technique can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pairwise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.

show abstract

Dimension reduction and visualization of large high-dimensional data via interpolation

Cited by 44 publications

References 21 publications

Browsing large scale cheminformatics data with dimension reduction

Browsing large scale cheminformatics data with dimension reduction

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm

A Robust and Scalable Solution for Interpolative Multidimensional Scaling with Weighting

Contact Info

Product

Resources

About