Dennis Gannon scite author profile

Data management is growing in complexity as largescale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.

show abstract

Workflows and e-Science: An overview of workflow system features and capabilities

Deelman

Gannon

Shields

et al. 2009

Future Generation Computer Systems

718

489

View full text Add to dashboard Cite

Examining the Challenges of Scientific Workflows

et al. 2007

View full text Add to dashboard Cite

show abstract

The GrADS Project: Software Support for High-Level Grid Application Development

Berman

Chien

Cooper

et al. 2001

The International Journal of High Performance Computing Applica

249

133

View full text Add to dashboard Cite

Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively different way—as a computational as well as an information resource. As described in the recent book The Grid: Blueprint for a New Computing Infrastructure, this Grid will connect the nation’s computers, databases, instruments, and people in a seamless web of computing and distributed intelligence, which can be used in an on-demand fashion as a problem-solving resource in many fields of human endeavor—and, in particular, science and engineering. The availability of grid resources will give rise to dramatically new classes of applications, in which computing resources are no longer localized but, rather, distributed, heterogeneous, and dynamic; computation is increasingly sophisticated and multidisciplinary; and computation is integrated into our daily lives and, hence, subject to stricter time constraints than at present. The impact of these new applications will be pervasive, ranging from new systems for scientific inquiry, through computing support for crisis management, to the use of ambient computing to enhance personal mobile computing environments. To realize this vision, significant scientific and technical obstacles must be overcome. Principal among these is usability. The goal of the Grid Application Development Software (GrADS) project is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing over the Internet. To that end, the project is exploring the scientific and technical problems that must be solved to make it easier for ordinary scientific users to develop, execute, and tune applications on the Grid. In this paper, the authors describe the vision and strategies underlying the GrADS project, including the base software architecture for grid execution and performance monitoring, strategies and tools for construction of applications from libraries of grid-aware components, and development of innovative new science and engineering applications that can exploit these new technologies to run effectively in grid environments.

show abstract

TeraGrid Science Gateways and Their Impact on Science

et al. 2008

View full text Add to dashboard Cite

A Framework for Collecting Provenance in Data-Centric Scientific Workflows

Simmhan

Plale

Gannon

2006

View full text Add to dashboard Cite

The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for datadriven applications that are under the control of data-centric workflows composed of grid-and web-services. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products).

show abstract

Strategies for cache and local memory management by global program transformation

Gannon

Jalby

Gallivan

1988

View full text Add to dashboard Cite

Service-oriented environments for dynamically interacting with mesoscale weather

Droegemeier¹,

Gannon²,

Reed³

et al. 2005

Comput. Sci. Eng.

View full text Add to dashboard Cite

E ach year across the US, mesoscale weather events-flash floods, tornadoes, hail, strong winds, lightning, and localized winter storms-cause hundreds of deaths, routinely disrupt transportation and commerce, and lead to economic losses averaging more than US$13 billion.1 Although mitigating the impacts of such events would yield enormous economic and societal benefits, research leading to that goal is hindered by rigid IT frameworks that can't accommodate the real-time, on-demand, dynamically adaptive needs of mesoscale weather research; its disparate, high-volume data sets and streams; or the tremendous computational demands of its numerical models and data-assimilation systems.In response to the increasingly urgent need for a comprehensive national cyberinfrastructure in mesoscale meteorology-particularly one that can interoperate with those being developed in other relevant disciplines-the US National Science Foundation (NSF) funded a large information technology research (ITR) grant in 2003, known as Linked Environments for Atmospheric Discovery (LEAD). A multidisciplinary effort involving nine institutions and more than 100 scientists, students, and technical staff in meteorology, computer science, social science, and education, LEAD addresses the fundamental research challenges needed to create an integrated, scalable framework for adaptively analyzing and predicting the atmosphere.LEAD's foundation is dynamic workflow orchestration and data management in a Web services framework. These capabilities provide for the use of analysis tools, forecast models, and data repositories,

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dennis Gannon

A survey of data provenance in e-science

Workflows and e-Science: An overview of workflow system features and capabilities

Examining the Challenges of Scientific Workflows

The GrADS Project: Software Support for High-Level Grid Application Development

TeraGrid Science Gateways and Their Impact on Science

A Framework for Collecting Provenance in Data-Centric Scientific Workflows

Strategies for cache and local memory management by global program transformation

Service-oriented environments for dynamically interacting with mesoscale weather

Contact Info

Product

Resources

About