An Information-Aware Framework for Exploring Multivariate Data Sets

Biswas, Ayan; Dutta, Soumya; Shen, Han‐Wei; Woodring, Jonathan

doi:10.1109/tvcg.2013.133

Cited by 64 publications

(46 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(3) We show that in an environment where data is stored on geographically distributed repositories, our method is able to speed up the correlation analysis process compared to a simple method that does not use any indexing. (4) We show that if correlation analysis is performed over samples, and not the entire dataset, what kind of speedup we can achieve and how much accuracy is lost.…”

Section: Resultsmentioning

confidence: 95%

“…Much of the existing work, especially in data visualization, has focused on individual variable analysis. However, more recently, several efforts [4,26] have focused on studying the relationship among multiple variables and making interesting scientific discoveries based on such analysis. This paper focuses on the problem of correlation analysis on massive scientific datasets in parallel and distributed settings.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Supporting correlation analysis on scientific datasets in parallel and distributed settings

Agrawal

Woodring

et al. 2014

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing

Self Cite

View full text Add to dashboard Cite

With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. Careful analysis of this data holds much promise for future scientific discoveries. Particularly, correlation analysis, which focuses on studying the potential relationships among multiple variables, is becoming a useful method for scientific analysis. This paper focuses on the problem of correlation analysis across large-scale simulation datasets, including 1) accelerating this analysis with the use of bitmap indexing as a representative summary of the data, 2) developing efficient algorithms for parallel execution, 3) performing analysis in distributed environments, i.e., for cases where different attributes are stored in geographically distributed repositories, and 4) combining sampling with correlation analysis. These algorithms have been implemented in a system that provides a high-level API for specification of the analyses, including allowing correlation analysis on specified value-based and dimension-based subsets of the data, and supports interactive and incremental analysis. We have extensively evaluated our framework for efficiency, and have also carried out case studies with domain scientists to establish how it can aid datadriven discovery process.

show abstract

Section: Resultsmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

Supporting correlation analysis on scientific datasets in parallel and distributed settings

Agrawal

Woodring

et al. 2014

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Chen et al [4] and Sukharev et al [28] utilize visualization techniques to show correlations in time-varying multivariate climate datasets with 3D spatial references. More recently, also for time-varying multivariate data, Biswas et al [3] used mutual information and information overlap as correlation. They utilized our layout optimization technique [33] to construct a complete connected graph for all variables.…”

Section: Correlation Visualization -A View In Contrastmentioning

confidence: 99%

Visual Correlation Analysis of Numerical and Categorical Data on the Correlation Map

Zhang

McDonnell

Zadok

et al. 2015

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

Abstract-Correlation analysis can reveal the complex relationships that often exist among the variables in multivariate data. However, as the number of variables grows, it can be difficult to gain a good understanding of the correlation landscape and important intricate relationships might be missed. We previously introduced a technique that arranged the variables into a 2D layout, encoding their pairwise correlations. We then used this layout as a network for the interactive ordering of axes in parallel coordinate displays. Our current work expresses the layout as a correlation map and employs it for visual correlation analysis. In contrast to matrix displays where correlations are indicated at intersections of rows and columns, our map conveys correlations by spatial proximity which is more direct and more focused on the variables in play. We make the following new contributions, some unique to our map: (1) we devise mechanisms that handle both categorical and numerical variables within a unified framework, (2) we achieve scalability for large numbers of variables via a multi-scale semantic zooming approach, (3) we provide interactive techniques for exploring the impact of value bracketing on correlations, and (4) we visualize data relations within the sub-spaces spanned by correlated variables by projecting the data into a corresponding tessellation of the map.

show abstract

“…Jänicke et al [26] extract local flow patterns as nodes in graph, and their transitions as edges where users can track features over time. There are some work visualizing the attribute relationship of scalar field using graph-like form [40,3].…”

Section: Exploration On Flow Field Datamentioning

confidence: 99%

FLDA: Latent Dirichlet Allocation Based Unsteady Flow Analysis

Fan

Lai

Guo

et al. 2014

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

In this paper, we present a novel feature extraction approach called FLDA for unsteady flow fields based on Latent Dirichlet allocation (LDA) model. Analogous to topic modeling in text analysis, in our approach, pathlines and features in a given flow field are defined as documents and words respectively. Flow topics are then extracted based on Latent Dirichlet allocation. Different from other feature extraction methods, our approach clusters pathlines with probabilistic assignment, and aggregates features to meaningful topics at the same time. We build a prototype system to support exploration of unsteady flow field with our proposed LDA-based method. Interactive techniques are also developed to explore the extracted topics and to gain insight from the data. We conduct case studies to demonstrate the effectiveness of our proposed approach.

show abstract

An Information-Aware Framework for Exploring Multivariate Data Sets

Cited by 64 publications

References 37 publications

Supporting correlation analysis on scientific datasets in parallel and distributed settings

Supporting correlation analysis on scientific datasets in parallel and distributed settings

Visual Correlation Analysis of Numerical and Categorical Data on the Correlation Map

FLDA: Latent Dirichlet Allocation Based Unsteady Flow Analysis

Contact Info

Product

Resources

About