With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. Careful analysis of this data holds much promise for future scientific discoveries. Particularly, correlation analysis, which focuses on studying the potential relationships among multiple variables, is becoming a useful method for scientific analysis. This paper focuses on the problem of correlation analysis across large-scale simulation datasets, including 1) accelerating this analysis with the use of bitmap indexing as a representative summary of the data, 2) developing efficient algorithms for parallel execution, 3) performing analysis in distributed environments, i.e., for cases where different attributes are stored in geographically distributed repositories, and 4) combining sampling with correlation analysis. These algorithms have been implemented in a system that provides a high-level API for specification of the analyses, including allowing correlation analysis on specified value-based and dimension-based subsets of the data, and supports interactive and incremental analysis. We have extensively evaluated our framework for efficiency, and have also carried out case studies with domain scientists to establish how it can aid datadriven discovery process.