Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.
Public cloud services rely on virtualization to support multitenancy-customers from different organizations are allowed to share the data center infrastructure. Unfortunately, today's public clouds fail to provide sufficient isolation. Hardware resources are often multiplexed between virtual machines that belong to different customers, and they can cause performance interference to each other. This article characterizes the interference on an important metric, the network latency between virtual machines, and shows that Amazon's EC2 cloud, a leading public cloud provider, suffers from a long tail latency problem. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that allows cloud customers to proactively detect and avoid these bad neighboring virtual machines without any help from cloud service providers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.