Hongbo Zou scite author profile

Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of move data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential dataanalytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data preprocessing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases-scientific data compression and remote visualization have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improve the application End-to-End transfer performance.

show abstract

FlexQuery: An online query system for interactive remote visual data exploration at large scale

Zou

Schwan

Sławińska

et al. 2013

View full text Add to dashboard Cite

The remote visual exploration of live data generated by scientific simulations is useful for scientific discovery, performance monitoring, and online validation for the simulation results. Online visualization methods are challenged, however, by the continued growth in the volume of simulation output data that has to be transferred from its source -the simulation running on the high end machine -to where it is analyzed, visualized, and displayed. A specific challenge in this context is limits in the communication bandwidth between data source(s) and sinks. Previous work places queries 'near' data sources, exploiting their data reduction capabilities, but such work does not address the common scenario in which scientists make multiple different queries on the data being produced. This paper considers the general case in which science users are interested in different (sub)sets of the data produced by a high end simulation. We offer the FlexQuery online data query system that can deploy and execute data queries 'along' the I/O and analytics pipelines. FlexQuery carefully extends such analytics pipelines, using online performance monitoring and data location tracking, to realize data queries in ways that minimize additional data movement and offer low latency in data query execution. Using a real-world scientific application -the Maya astrophysics code and its analytics workflow -we demonstrate FlexQuery's ability to dynamically deploy queries for low-latency remote data visualization.

show abstract

Exploring user engagement strategies and their impacts with social media mining: the case of public libraries

Zou

Chen

Dey

2015

Journal of Management Analytics

View full text Add to dashboard Cite

A Source-aware Interrupt Scheduling for Modern Parallel I/O Systems

Zou

Sun

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hongbo Zou

FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics

FlexAnalytics: A Flexible Data Analytics Framework for Big Data Applications with I/O Performance Improvement

FlexQuery: An online query system for interactive remote visual data exploration at large scale

Exploring user engagement strategies and their impacts with social media mining: the case of public libraries

A Source-aware Interrupt Scheduling for Modern Parallel I/O Systems

Contact Info

Product

Resources

About