Deep learning is transforming the analysis of biological images but applying these models to large datasets remains challenging. Here we describe the DeepCell Kiosk, cloud-native software that dynamically scales deep learning workflows to accommodate large imaging datasets. To demonstrate the scalability and affordability of this software, we identified cell nuclei in 10 6 1-megapixel images in ~5.5 h for ~$250, with a sub-$100 cost achievable depending on cluster configuration. The DeepCell Kiosk can be downloaded at https://github.com/vanvalenlab/kiosk-console; a persistent deployment is available at https://deepcell.org. Main Text While deep learning is an increasingly popular approach to extracting quantitative information from biological images, its limitations significantly hinder its widespread adoption. Chief among these limitations are the requirements for expansive sets of training data and computational resources. Here, we sought to overcome the latter limitation. While deep learning methods have remarkable accuracy for a range of image-analysis tasks including classification 1 , segmentation 2-4 , and object tracking 5,6 , they have limited throughput even with GPU acceleration. For example, even when running segmentation models on a GPU, typical inference speeds on megapixel-scale images are in the range of 5-10 frames per second, limiting the scope of analyses that can be performed on images in a timely fashion. The necessary domain knowledge and associated costs of GPUs pose further barriers to entry, although recent software packages 7-11 have attempted to solve these two issues. While cloud computing has proven effective for other data types 12-15 , scaling analyses to large imaging datasets in the cloud while constraining costs is a considerable challenge. To meet this need, here we have developed the DeepCell Kiosk (Fig. 1a). This software package takes in configuration details (user authentication, GPU type, etc.) and creates a cluster in the cloud that runs predefined deep learningenabled image-analysis pipelines. This cluster is managed by Kubernetes, an open-source framework for running software containers (software that is bundled with its dependencies so it can be run as an isolated process) across a group of servers. An alternative way to view Kubernetes is as an operating system for cloud computing. Data is submitted to the cluster through either a web-based front-end, a command line tool, or an ImageJ plugin. Once submitted, it is placed in a database where the specified image-analysis pipeline can pick up the dataset, perform the desired analysis, and make the results available for download. Results can be visualized by a variety of visualization software tools 16,17. To ensure that image-analysis pipelines can be run efficiently on this cluster, we made two software design choices. First, image-analysis pipelines access trained deep learning models through a centralized model server in the cluster. This strategy enables the cluster to efficiently allocate resources, as the various co...
Live-cell imaging experiments have opened an exciting window into the behavior of living systems. While these experiments can produce rich data, the computational analysis of these datasets is challenging. Single-cell analysis requires that cells be accurately identified in each image and subsequently tracked over time. Increasingly, deep learning is being used to interpret microscopy image with single cell resolution. In this work, we apply deep learning to the problem of tracking single cells in live-cell imaging data. Using crowdsourcing and a human-in-the-loop approach to data annotation, we constructed a dataset of over 11,000 trajectories of cell nuclei that includes lineage information. Using this dataset, we successfully trained a deep learning model to perform cell tracking within a linear programming framework. Benchmarking tests demonstrate that our method achieves state-of-the-art performance on the task of cell tracking with respect to multiple accuracy metrics. Further, we show that our deep learning-based method generalizes to perform cell tracking for both fluorescent and brightfield images of the cell cytoplasm, despite having never been trained on those data types. This enables analysis of live-cell imaging data collected across imaging modalities. A persistent cloud deployment of our cell tracker is available at http://www.deepcell.org.
Deep learning is transforming the ability of life scientists to extract information from images. These techniques have better accuracy than conventional approaches and enable previously impossible analyses. As the capability of deep learning methods expands, they are increasingly being applied to large imaging datasets. The computational demands of deep learning present a significant barrier to large-scale image analysis. To meet this challenge, we have developed DeepCell 2.0, a platform for deploying deep learning models on large imaging datasets (>10 5megapixel images) in the cloud. This software enables the turnkey deployment of a Kubernetes cluster on all commonly used operating systems. By using a microservice architecture, our platform matches computational operations with their hardware requirements to reduce operating costs. Further, it scales computational resources to meet demand, drastically reducing the time necessary for analysis of large datasets. A thorough analysis of costs demonstrates that cloud computing is economically competitive for this application. By treating hardware infrastructure as software, this work foreshadows a new generation of software packages for biology in which computational resources are a dynamically allocated resource.
No abstract
In the version of this article initially published, the order of the layers was reversed in Fig. 1a in the items labeled Image(s) and Output, leading to black square outlines overlaid on the images. In Fig. 1b, lettering reading 2.4 GB and 240 MB (left panel) and 2.4 GB (right panel) should have read 2.4 Gbs, 240 Mbs and 2.4 Gbs, respectively. The errors have been corrected in the PDF and HTML versions of the article.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.