Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications , (2) the diversity of analysis goals , (3) the heterogeneity of computing platforms , and (4) the volume and distribution of data . A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area.
The VERCE project has pioneered an e-Infrastructure to support researchers using established simulation codes on high-performance computers in conjunction with multiple sources of observational data. This is accessed and organised via the VERCE science gateway that makes it convenient for seismologists to use these resources from any location via the Internet. Their data handling is made flexible and scalable by two Python libraries, ObsPy and dispel4py and by data services delivered by ORFEUS and EUDAT. Provenance driven tools enable rapid exploration of results and of the relationships between data, which accelerates understanding and method improvement. These powerful facilities are integrated and draw on many other e-Infrastructures. This paper presents the motivation for building such systems, it reviews how solid-Earth scientists can make significant research progress using them and explains the architecture and mechanisms that make their construction and operation achievable. We conclude with a summary of the achievements to date and identify the crucial steps needed to extend the capabilities for seismologists, for solid-Earth scientists and for similar disciplines.
This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology.
M. Galea, Q. Shen and J. Levine. Evolutionary approaches to fuzzy modelling. Knowledge Engineering Review, 19(1):27-59, 2004.An overview of the application of evolutionary computation to fuzzy knowledge discovery is presented. This is set in one of two contexts: overcoming the knowledge acquisition bottleneck in the development of intelligent reasoning systems, and in the data mining of databases where the aim is the discovery of new knowledge. The different strategies utilizing evolutionary algorithms for knowledge acquisition are abstracted from the work reviewed. The simplest strategy runs an evolutionary algorithm once, while the iterative rule learning approach runs several evolutionary algorithms in succession, with the output from each considered a partial solution. Ensembles are formed by combining several classifiers generated by evolutionary techniques, while co-evolution is often used for evolving rule bases and associated membership functions simultaneously. The associated strengths and limitations of these induction strategies are compared and discussed. Ways in which evolutionary techniques have been adapted to satisfy the common evaluation criteria of the induced knowledge?classification accuracy, comprehensibility and novelty value?are also considered. The review concludes by highlighting common limitations of the experimental methodology used and indicating ways of resolving them.Non peer reviewe
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.