SUMMARYInterest in data streaming within scientific workflow has increased significantly over the recent yearsmainly due to the emergence of data-driven applications. Such applications can include data streaming from sensors and data coupling between scientific simulations. To support resource management to enact such streaming-based workflow, autonomic computing techniques for transmission have been combined with in-transit processing, so that data elements may be processed in advance, enroute, prior to arrival at the destination. We propose the integration of an autonomic data streaming service (ADSS) with in-transit processing into a workflow specification. This integration may imply that the associated runtime resource allocation is dependent on environmental conditions and can change for different enactments of the same workflow. In our proposal, our workflow specifications are independent of the constraints imposed by the resource allocation. We express our solutions in terms of Reference nets. We also implement an ADSS utilizing a timed Reference net simulation for predicting future states of the ADSS. There are two advantages: the Reference net which implements the ADSS and the timed model are coincident, and second, token distribution obtained from the Petri net implementation can be utilized to better understand the demand for particular types of resources in the system.
The ability to support Quality of Service (QoS) constraints is an important requirement in some scientific applications. With the increasing use of Cloud computing infrastructures, where access to resources is shared, dynamic and provisioned on-demand, identifying how QoS constraints can be supported becomes an important challenge. However, access to dedicated resources is often not possible in existing Cloud deployments and limited QoS guarantees are provided by many commercial providers (often restricted to error rate and availability, rather than particular QoS metrics such as latency or access time). We propose a workflow system architecture which enforces QoS for the simultaneous execution of multiple scientific workflows over a shared infrastructure (such as a Cloud environment). Our approach involves multiple pipeline workflow instances, with each instance having its own QoS requirements. These workflows are composed of a number of stages, with each stage being mapped to one or more physical resources. A stage involves a combination of data access, computation and data transfer capability. A token bucket-based data throttling framework is embedded into the workflow system architecture. Each workflow instance stage regulates the amount of data that is injected into the shared resources, allowing for bursts of data to be injected while at the same time providing isolation of workflow streams. We demonstrate our approach by using the Montage workflow, and develop a Reference net model of the workflow.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.