Abstract:Cell biology is increasingly focused on cellular heterogeneity and multicellular systems. To make the fullest use of experimental, clinical, and computational efforts, we need standardized data formats, community-curated "public data libraries", and tools to combine and analyze shared data. To address these needs, our multidisciplinary community created MultiCellDS (MultiCellular Data Standard): an extensible standard, a library of digital cell lines and tissue snapshots, and support software. With the help of experimentalists, clinicians, modelers, and data and library scientists, we can grow this seed into a community-owned ecosystem of shared data and tools, to the benefit of basic science, engineering, and human health.
Unmet needs for collecting and curating multicellular dataBiology is increasingly focused on studying cellular heterogeneity and multicellular systems. Novel experiments, clinical trials, and simulation studies are generating incredible amounts of data on cell behavior, cell-cell and cell-matrix interactions, and cellular microenvironmental conditions. These advances are creating exciting new opportunities to formulate and test hypotheses, while synthesizing these disparate data sources to gain a deeper tissue-level understanding of health and disease.However, the deluge of data has pushed existing data sharing and analysis paradigms to their limits. Key insights are effectively hidden in plain sight: tucked away in images, graphs, and tables; divorced from context; and inaccessible to computer analysis without significant manual work. While some data are online, much more are trapped offline on researchers' flash drives, manually traded in emails, or inaccessible in private cloud storage. This severely limits data sharing, collaboration, and post-publication analyses that can offer new and unexpected insights.There have been significant efforts to address these issues, but so far they have focused on describing genomic and molecular data (e.g., the Gene Ontology [1] for genetic data) or mathematical models (e.g., the Systems Biology Markup Language [2] for cell signaling models). None of these efforts have created a fixed data format for interchanging multicellular data or collected cell phenotype insights from many labs into shared, community-curated libraries with a uniform format. And while vast troves of experimental and clinical image . CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/090696 doi: bioRxiv preprint first posted online Dec. 9, 2016; data are available online to drive machine learning, we lack a standardized way to record extracted features, such as cell positions, sizes, shapes, and immunohistochemical stain statuses. Moreover, our lack of standardized data prevents us from directly linking between experimental and computational model systems, while also hindering our efforts to reconcile experimental and simulation results against...