Cloud computing offers on-demand, scalable computing and storage, and has become an essential resource for the analyses of big biomedical data. The usual approach to cloud computing requires users to reserve and provision virtual servers. An emerging alternative is to have the provider allocate machine resources dynamically. This type of serverless computing has tremendous potential for biomedical research in terms of ease-of-use, instantaneous scalability, and cost effectiveness. In our proof of concept example, we demonstrate how serverless computing provides low cost access to hundreds of CPUs, on demand, with little or no setup. In particular, we illustrate that the all-against-all pairwise comparison among all unique human proteins can be accomplished in approximately 2 minutes, at a cost of less than $1, using Amazon Web Services Lambda. We also demonstrate the feasibility of our approach using Google Functions and show that the same task of pairwise protein sequence comparison can be accomplished in approximately 11.5 minutes. In contrast, running the same task on a typical laptop computer required 8.7 hours.
We have used serverless AWS Lambda functions to align 640 million reads in less than 3 minutes, a speed-up of 500x over the single-threaded implementation. Using a hybrid cloud architecture and software modified to optimize disk transfers, an entire RNA sequencing workflow transforming multiplexed reads to transcript counts that originally took 29 hours can be completed in 18 minutes. This is a 100x improvement over the original single threaded implementation and 12x faster than an optimized cloud server-based implementation using 16 threads. The total cost of the analyses is $2.82 for 96 wells or 3 cents per multiplexed sample.This approach can be used for human datasets that are generated for single experiments and does not rely on processing large numbers of samples to achieve the performance gains. The workflow is publicly available under a M.I.T. license (https://github.com/BioDepot/RNA-seqlambda).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.