Reliability is widely identified as an increasingly relevant issue in heterogeneous service-oriented systems because 7 processor failure affects the quality of service to users. Replication-based fault-tolerance is a common approach to satisfy application's 8 reliability requirement. This study solves the problem of minimizing redundancy to satisfy reliability requirement for a directed acyclic 9 graph (DAG)-based parallel application on heterogeneous service-oriented systems. We first propose the enough replication for 10 redundancy minimization (ERRM) algorithm to satisfy application's reliability requirement, and then propose heuristic replication for 11 redundancy minimization (HRRM) to satisfy application's reliability requirement with low time complexity. Experimental results on real 12 and randomly generated parallel applications at different scales, parallelism, and heterogeneity verify that ERRM can generate least 13 redundancy followed by HRRM, and the state-of-the-art MaxRe and RR algorithm. In addition, HRRM implements approximate 14 minimum redundancy with a short computation time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.